Position dependent recognition of GNN nucleotide triplets by zinc fingers

ABSTRACT

The specificity of binding of a zinc finger to a triplet or quadruplet nucleotide target subsite depends upon the location of the zinc finger in a multifinger protein and, hence, upon the location of its target subsite within a larger target sequence. The present disclosure provides zinc finger amino acid sequences for recognition of triplet target subsites having the nucleotide G in the 5′-most position of the subsite, that have been optimized with respect to the location of the subsite within the target site. Accordingly, the disclosure provides finger position-specific amino acid sequences for the recognition of GNN target subsites. This allows the construction of multi-finger zinc finger proteins with improved affinity and specificity for their target sequences, as well as enhanced biological activity.

CROSS-REFERENCES TO RELATED APPLICATIONS

[0001] The present application is a continuation-in-part of copendingU.S. patent application Ser. No. 09/535,008, filed Mar. 23, 2000, whichapplication claims the benefit of U.S. provisional applications No.60/126,238, filed Mar. 24, 1999, No. 60/126,239 filed Mar. 24, 1999, No.60/146,595 filed Jul. 30, 1999 and No. 60/146,615 filed Jul. 30, 1999.The present application is also a continuation-in-part of copending U.S.patent application Ser. No. 09/716,637, filed Nov. 20, 2000. Thedisclosures of all of the aforementioned applications are herebyincorporated by reference in their entireties for all purposes.

BACKGROUND

[0002] Zinc finger proteins (ZFPs) are proteins that can bind to DNA ina sequence-specific manner. Zinc fingers were first identified in thetranscription factor TFIIIA from the oocytes of the African clawed toad,Xenopus laevis. An exemplary motif characterizing one class of theseprotein (C₂H₂ class) is -Cys-(X)₂₋₄-Cys-(X)₁₂-His-(X)₃₋₅-His (where X isany amino acid) (SEQ. ID. No:1). A single finger domain is about 30amino acids in length, and several structural studies have demonstratedthat it contains an alpha helix containing the two invariant histidineresidues and two invariant cysteine residues in a beta turn co-ordinatedthrough zinc. To date, over 10,000 zinc finger sequences have beenidentified in several thousand known or putative transcription factors.Zinc finger domains are involved not only in DNA-recognition, but alsoin RNA binding and in protein-protein binding. Current estimates arethat this class of molecules will constitute about 2% of all humangenes.

[0003] The x-ray crystal structure of Zif268, a three-finger domain froma murine transcription factor, has been solved in complex with a cognateDNA sequence and shows that each finger can be superimposed on the nextby a periodic rotation. The structure suggests that each fingerinteracts independently with DNA over 3 base-pair intervals, withside-chains at positions −1, 2 , 3 and 6 on each recognition helixmaking contacts with their respective DNA triplet subsites. The aminoterminus of Zif268 is situated at the 3′ end of the DNA strand withwhich it makes most contacts. Some zinc fingers can bind to a fourthbase in a target segment. If the strand with which a zinc finger proteinmakes most contacts is designated the target strand, some zinc fingerproteins bind to a three base triplet in the target strand and a fourthbase on the nontarget strand. The fourth base is complementary to thebase immediately 3′ of the three base subsite.

[0004] The structure of the Zif268-DNA complex also suggested that theDNA sequence specificity of a zinc finger protein might be altered bymaking amino acid substitutions at the four helix positions (−1, 2, 3and 6) on each of the zinc finger recognition helices. Phage displayexperiments using zinc finger combinatorial libraries to test thisobservation were published in a series of papers in 1994 (Rebar et al.,Science 263, 671-673 (1994); Jamieson et al., Biochemistry 33, 5689-5695(1994); Choo et al, PNAS 91, 11163-11167 (1994)). Combinatoriallibraries were constructed with randomized side-chains in either thefirst or middle finger of Zif268 and then used to select for an alteredZif268 binding site in which the appropriate DNA sub-site was replacedby an altered DNA triplet. Further, correlation between the nature ofintroduced mutations and the resulting alteration in binding specificitygave rise to a partial set of substitution rules for design of ZFPs withaltered binding specificity.

[0005] Greisman & Pabo, Science 275, 657-661 (1997) discuss anelaboration of the phage display method in which each finger of a Zif268was successively randomized and selected for binding to a new tripletsequence. This paper reported selection of ZFPs for a nuclear hormoneresponse element, a p53 target site and a TATA box sequence.

[0006] A number of papers have reported attempts to produce ZFPs tomodulate particular target sites. For example, Choo et al., Nature 372,645 (1994), report an attempt to design a ZFP that would repressexpression of a bcr-abl oncogene. The target segment to which the ZFPswould bind was a nine, base sequence 5′GCA GAA GCC3′ chosen to overlapthe junction created by a specific oncogenic translocation fusing thegenes encoding bcr and abl. The intention was that a ZFP specific tothis target site would bind to the oncogene without binding to abl orbcr component genes. The authors used phage display to screen amini-library of variant ZFPs for binding to this target segment. Avariant ZFP thus isolated was then reported to repress expression of astably transfected bcr-able construct in a cell line.

[0007] Pomerantz et al., Science 267, 93-96 (1995) reported an attemptto design a novel DNA binding protein by fusing two fingers from Zif268with a homeodomain from Oct-1. The hybrid protein was then fused with atranscriptional activator for expression as a chimeric protein. Thechimeric protein was reported to bind a target site representing ahybrid of the subsites of its two components. The authors thenconstructed a reporter vector containing a luciferase gene operablylinked to a promoter and a hybrid site for the chimeric DNA bindingprotein in proximity to the promoter. The authors reported that theirchimeric DNA binding protein could activate expression of the luciferasegene.

[0008] Liu et al., PNAS 94, 5525-5530 (1997) report forming a compositezinc finger protein by using a peptide spacer to link two component zincfinger proteins each having three fingers. The composite protein wasthen further linked to transcriptional activation domain. It wasreported that the resulting chimeric protein bound to a target siteformed from the target segments bound by the two component zinc fingerproteins. It was further reported that the chimeric zinc finger proteincould activate transcription of a reporter gene when its target site wasinserted into a reporter plasmid in proximity to a promoter operablylinked to the reporter.

[0009] Choo et al., WO 98/53058, WO98/53059, and WO 98/53060 (1998)discuss selection of zinc finger proteins to bind to a target sitewithin the HIV Tat gene. Choo et al. also discuss selection of a zincfinger protein to bind to a target site encompassing a site of a commonmutation in the oncogene ras. The target site within ras was thusconstrained by the position of the mutation.

[0010] Previously-disclosed methods for the design of sequence-specificzinc finger proteins have often been based on modularity of individualzinc fingers; i.e., the ability of a zinc finger to recognize the sametarget subsite regardless of the location of the finger in amulti-finger protein. Although, in many instances, a zinc finger retainsthe same sequence specificity regardless of its location within amulti-finger protein; in certain cases, the sequence specificity of azinc finger depends on its position. For example, it is possible for afinger to recognize a particular triplet sequence when it is present asfinger 1 of a three-finger protein, but to recognize a different tripletsequence when present as finger 2 of a three-finger protein.

[0011] Attempts to address situations in which a zinc finger behaves ina non-modular fashion (i.e., its sequence specificity depends upon itslocation in a multi-finger protein) have, to date, involved strategiesemploying randomization of key binding residues in multiple adjacentzinc fingers, followed by selection. See, for example, Isalan et al.(2001) Nature Biotechnol. 19:656-660. However, methods for rationaldesign of polypeptides containing non-modular zinc fingers have notheretofore been described.

SUMMARY

[0012] The present disclosure provides compositions comprising andmethods involving position dependent recognition of GNN nucleotidetriplets by zinc fingers.

[0013] Thus, provided herein is a zinc finger protein that binds to atarget site, said zinc finger protein comprising a first (F1), a second(F2), and a third (F3) zinc finger, ordered F1, F2, F3 from N-terminusto C-terminus, said target site comprising, in 3′ to 5′ direction, afirst (S1), a second (S2), and a third (S3) target subsite, each targetsubsite having the nucleotide sequence GNN, wherein if S1 comprises GAA,F1 comprises the amino acid sequence QRSNLVR; if S2 comprises GAA, β2comprises the amino acid sequence QSGNLA R; if S3 comprises GAA, F3comprises the amino acid sequence QSGNLAR; if S1 comprises GAG , F1comprises the amino acid sequence RSDNLAR; if S2 comprises GAG, F2comprises the amino acid sequence RSDNLAR; if S3 comprises GAG, F3comprises the amino acid sequence RSDNLTR; if S1 comprises GAC, F1comprises the amino acid sequence DRSNLTR; if S2 comprises GAC, F2comprises the amino acid sequence DRSNLTR; if S3 comprises GAC, F3comprises the amino acid sequence DRSNLTR; if S comprises GAT , F1comprises the amino acid sequence QSSNLAR; if S2 comprises GAT, F2comprises the amino acid sequence TSGNLVR; if S3 comprises GAT, F3comprises the amino acid sequence TSANLSR; if S1 comprises GGA, F1comprises the amino acid sequence QSGHLAR; if S2 comprises GGA, F2comprises the amino acid sequence QSGHLQR; if S3 comprises GGA, F3comprises the amino acid sequence QSGHLQR; if S1 comprises GGG, F1comprises the amino acid sequence RSDHLAR; if S2 comprises GGG, F2comprises the amino acid sequence RSDHLSR; if S3 comprises GGG, F3comprises the amino acid sequence RSDHLSR; if S1 comprises GGC, F1comprises the amino acid sequence DRSHLRT; if S2 comprises GGC, F2comprises the amino acid sequence DRSHLAR; if S1 comprises GGT, F1comprises the amino acid sequence QSSHLTR; if S2 comprises GGT, F2comprises the amino acid sequence TSGHLSR; if S3 comprises GGT, F3comprises the amino acid sequence TSGHLVR; if S1 comprises GCA, F1comprises the amino acid sequence QSGSLTR; if S2 comprises GCA, F2comprises QSGDLTR; if S3 comprises GCA, F3 comprises QSGDLTR; if S1comprises GCG, F1 comprises the amino acid sequence RSDDLTR; if S2comprises GCG, F2 comprises the amino acid sequence RSDDLQR; if S3comprises GCG, F3 comprises the amino acid sequence RSDDLTR; if S1comprises GCC, F1 comprises the amino acid sequence ERGTLAR; if S2comprises GCC, F2 comprises the amino acid sequence DRSDLTR; if S3comprises GCC, F3 comprises the amino acid sequence DRSDLTR; if S1comprises GCT, F1 comprises the amino acid sequence QSSDLTR; if S2comprises GCT, F2 comprises the amino acid sequence QSSDLTR; if S3comprises GCT, F3 comprises the amino acid sequence QSSDLQR; if S1comprises GTA, F1 comprises the amino acid sequence QSGALTR; if S2comprises GTA, F2 comprises the amino acid sequence QSGALAR; if S1comprises GTG, F1 comprises the amino acid sequence RSDALTR; if S2comprises GTG, F2 comprises the amino acid sequence RSDALSR; if S3comprises GTG, F3 comprises the amino acid sequence RSDALTR; if S1comprises GTC, F1 comprises the amino acid sequence DRSALAR; if S2comprises GTC, F2 comprises the amino acid sequence DRSALAR; and if S3comprises GTC, F3 comprises the amino acid sequence DRSALAR.

[0014] Also provided are methods of designing a zinc finger proteincomprising a first (F1), a second (F2), and a third (F3) zinc finger,ordered F1, F2, F3 from N-terminus to C-terminus that binds to a targetsite comprising, in 3′ to 5′ direction, a first (S1), a second (S2), anda third (S3) target subsite, each target subsite having the nucleotidesequence GNN, the method comprising the steps of (a) selecting the F1zinc finger such that it binds to the S1 target subsite, wherein if S1comprises GAA, F1 comprises the amino acid sequence QRSNLVR; if S1comprises GAG, F1 comprises the amino acid sequence RSDNLAR; if S1comprises GAC, F1 comprises the amino acid sequence DRSNLTR; if S1comprises GAT, F1 comprises the amino acid sequence QSSNLAR; if S1comprises GGA, F1 comprises the amino acid sequence QSGHLAR; if S1comprises GGG, F1 comprises the amino acid sequence RSDHLAR; if S1comprises GGC, F1 comprises the amino acid sequence DRSHLRT; if S1comprises GGT, F1 comprises the amino acid sequence QSSHLTR; if S1comprises GCA, F1 comprises QSGSLTR; if S1 comprises GCG, F1 comprisesRSDDLTR; if S2 comprises GCG, F2 comprises RSDDLQR; if S1 comprises GCC,F1 comprises ERGTLAR; if S1 comprises GCT, F1 comprises the amino acidsequence QSSDLTR; if S1 comprises GTA, F1 comprises the amino acidsequence QSGALTR; if S1 comprises GTG, F1 comprises the amino acidsequence RSDALTR; if S1 comprises GTC, F1 comprises the amino acidsequence DRSALAR; (b) selecting the F2 zinc finger such that it binds tothe S2 target subsite, wherein S2 comprises GAA, F2 comprises the aminoacid sequence QSGNLAR; if S2 comprises GAG, F2 comprises the amino acidsequence RSDNLAR; if S2 comprises GAC, F2 comprises the amino acidsequence DRSNLTR; if S2 comprises GAT, F2 comprises the amino acidsequence TSGNLVR; if S2 comprises GGA, F2 comprises the amino acidsequence QSGHLQR; if S2 comprises GGG, F2 comprises the amino acidsequence RSDHLSR; if S2 comprises GGC, F2 comprises the amino acidsequence DRSHLAR; if S2 comprises GGT, F2 comprises the amino acidsequence TSGHLSR; if S2 comprises GCA, F2 comprises the amino acidsequence QSGDLTR; if S2 comprises GCC, F2 comprises the amino acidsequence DRSDLTR; if S2 comprises GCT, F2 comprises the amino acidsequence QSSDLTR; if S2 comprises GTA, F2 comprises the amino acidsequence QSGALAR; if S2 comprises GTG, F2 comprises the amino acidsequence RSDALSR; if S2 comprises GTC, F2 comprises the amino acidsequence DRSALAR; and (c) selecting the F3 zinc finger such that itbinds to the S3 target subsite, wherein if S3 comprises GAA, F3comprises the amino acid sequence QSGNLAR; if S3 comprises GAG, F3comprises the amino acid sequence RSDNLTR; if S3 comprises GAC, F3comprises the amino acid sequence DRSNLTR; if S3 comprises GAT, F3comprises the amino acid sequence TSANLSR; if S3 comprises GGA, F3comprises the amino acid sequence QSGHLQR; if S3 comprises GGG, F3comprises RSDHLSR; if S3 comprises GGT, F3 comprises the amino acidsequence TSGHLVR; if S3 comprises GCA, F3 comprises the amino acidsequence QSGDLTR; if S3 comprises GCG, F3 comprises the amino acidsequence RSDDLTR; if S3 comprises GCC, F3 comprises the amino acidsequence DRSDLTR; if S3 comprises GCT, F3 comprises the amino acidsequence QSSDLQR; if S3 comprises GTG, F3 comprises RSDALTR; and if S3comprises GTC, F3 comprises the amino acid sequence DRSALAR;

[0015] thereby designing a zinc finger protein that binds to a targetsite.

[0016] In certain embodiments of the zinc finger proteins and methodsdescribed herein, S1 comprises GAA and F1 comprises the amino acidsequence QRSNLVR. In other embodiments, S2 comprises GAA and F2comprises the amino acid sequence QSGNLAR. In other embodiments, S3comprises GAA and F3 comprises the amino acid sequence QSGNLAR. In otherembodiments, S1 comprises GAG and F1 comprises the amino acid sequenceRSDNLAR. In other embodiments, S2 comprises GAG and F2 comprises theamino acid sequence RSDNLAR. In other embodiments, S3 comprises GAG andF3 comprises the amino acid sequence RSDNLTR. In other embodiments, S1comprises GAC and F1 comprises the amino acid sequence DRSNLTR. In otherembodiments, S2 comprises GAC and F2 comprises the amino acid sequenceDRSNLTR. In other embodiments, S3 comprises GAC and F3 comprises theamino acid sequence DRSNLTR . In other embodiments, S1 comprises GAT andF1 comprises the amino acid sequence QSSNLAR. In other embodiments, S2comprises GAT and F2 comprises the amino acid sequence TSGNLVR . Inother embodiments, S3 comprises GAT and F3 comprises the amino acidsequence TSANLSR. In other embodiments, S1 comprises GGA and F1comprises the amino acid sequence QSGHLAR. In other embodiments, S2comprises GGA and F2 comprises the amino acid sequence QSGHLQR. In otherembodiments, S3 comprises GGA and F3 comprises the amino acid sequenceQSGHLQR. In other embodiments, S1 comprises GGG and F1 comprises theamino acid sequence RSDHLAR. In other embodiments, S2 comprises GGG andF2 comprises the amino acid sequence RSDHLSR. In other embodiments, S3comprises GGG and F3 comprises the amino acid sequence RSDHLSR. In otherembodiments, S1 comprises GGC and F1 comprises the amino acid sequenceDRSHLTR. In other embodiments, S2 comprises GGC and F2 comprises theamino acid sequence DRSHLAR. In other embodiments, S1 comprises GGT andF1 comprises the amino acid sequence QSSHLTR. In other embodiments, S2comprises GGT and F2 comprises the amino acid sequence TSGHLSR. In otherembodiments, S3 comprises GGT and F3 comprises the amino acid sequenceTSGHLVR. In other embodiments, S1 comprises GCA and F1 comprises theamino acid sequence QSGSLTR. In other embodiments, S2 comprises GCA andF2 comprises the amino acid sequence QSGDLTR. In other embodiments, S3comprises GCA and F3 comprises the amino acid sequence QSGDLTR. In otherembodiments, S1 comprises GCG and F1 comprises the amino acid sequenceRSDDLTR. In other embodiments, S2 comprises GCG and F2 comprises theamino acid sequence RSDDLQR. In other embodiments, S3 comprises GCG andF3 comprises the amino acid sequence RSDDLTR. In other embodiments, S1comprises GCC and F1 comprises the amino acid sequence ERGTLAR. In otherembodiments, S2 comprises GCC and F2 comprises the amino acid sequenceDRSDLTR. In other embodiments, S3 comprises GCC and F3 comprises theamino acid sequence DRSDLTR. in other embodiments, S1 comprises GCT andF1 comprises the amino acid sequence QSSDLTR. In other embodiments, S2comprises GCT and F2 comprises the amino acid sequence QSSDLTR. In otherembodiments, S3 comprises GCT and F3 comprises the amino acid sequenceQSSDLQR. In other embodiments, S1 comprises GTA and F1 comprises theamino acid sequence QSGALTR . In other embodiments, S2 comprises GTA andF2 comprises the amino acid sequence QSGALAR . In other embodiments, S1comprises GTG and F1 comprises the amino acid sequence RSDALTR. In otherembodiments, S2 comprises GTG and F2 comprises the amino acid sequenceRSDALSR. In other embodiments, S3 comprises GTG and F3 comprises theamino acid sequence RSDALTR. In other embodiments, S1 comprises GTC andF1 comprises the amino acid sequence DRSALAR. In other embodiments, S2comprises GTC and F2 comprises the amino acid sequence DRSALAR . Inother embodiments, S3 comprises GTC and p3 comprises the amino acidsequence DRSALAR.

[0017] Also provided are polypeptides comprising any of zinc fingerproteins described herein. In certain embodiments, the polypeptidefurther comprises at least one functional domain. Also provided arepolynucleotides encoding any of the polypeptides described herein. Thus,also provided are nucleic acid encoding zinc fingers, including all ofthe zinc fingers described above.

[0018] Also provided are segments of a zinc finger comprising a sequenceof seven contiguous amino acids as shown herein. Also provided arenucleic acids encoding any of these segments and zinc fingers comprisingthe same.

[0019] Also provided are zinc finger proteins comprising first, secondand third zinc fingers. The first, second and third zinc fingerscomprise respectively first, second and third segments of sevencontiguous amino acids as shown herein. Also provided are nucleic acidsencoding such zinc finger proteins.

BRIEF DESCRIPTION OF THE DRAWINGS

[0020]FIG. 1 shows results of site selection analysis of tworepresentative zinc finger proteins (leftmost 4 columns) andmeasurements of binding affinity for each of these proteins to theirintended target sequences and to variant target sequences. (rightmost 3columns). Analysis of ZFP1 is shown in the upper portion of the figureand analysis of ZFP2 is shown in the lower portion of the figure. Forthe site selection analyses, the amino acid sequences of residues −1through +6 of the recognition helix of each of the three component zincfingers (F3, F2 and F1) are shown across the top row; the intendedtarget sequence (divided into finger-specific target subsites) is shownacross the second row, and a summary of the sequences bound is shown inthe third row. Data for F3 is shown in the second column, data for F2 isshown in the third column, and data for F1 is shown in the third column.

[0021] For the binding affinity analyses, the designed target sequencefor each ZFP (“cognate”) and two related sequences (“Mt”) are shown(column 6), along with the K_(d) for binding of the ZFP to each of thesesequences (column 7).

[0022]FIG. 2 shows amino acid sequences of zinc finger recognitionregions (amino acids −1 through +6 of the recognition helix) that bindto each of the 16 GNN triplet subsites. Three amino acid sequences areshown for each trinucleotide subsite; these correspond to optimal aminoacid sequences for recognition of the subsite from each of the threepositions (finger 1, F1; finger 2, F2; or finger 3, F3) in athree-finger zinc finger protein. Amino acid sequences are fromN-terminal to C-terminal; nucleotide sequences are from 5′ to 3′.

[0023] Also shown are site selection results for each of the 48position-dependent GNN-recognizing zinc fingers. These show the numberof times a particular nucleotide was present, at a given position, in acollection of oligonucleotide sequences bound by the finger. Forexample, out of 15 oligonucleotides bound by a zinc finger protein withthe amino acid sequence QSGHLAR present at the finger 1 (F1) position,15 contained a G in the 5′-most position of the subsite, 15 contained aG in the middle position of the subsite, while, at the 3′-most positionof the subsite, 10 contained an A, 3 contained a G and 2 contained a T.Accordingly, this particular amino acid sequence is optimal for bindinga GGA triplet from the F1 position.

[0024]FIGS. 3A, 3B and 3C show site selection data indicating positionaldependence of GCA-, GAT- and GGT-binding zinc fingers. The first andfourth (where applicable) rows of each figure show portions of the aminoacid sequence of a designed zinc finger protein. Amino acid residues −1through +6 of each α-helix are listed from left to right. The second andfifth (where applicable) rows show the target sequence, divided intothree triplet subsites, one for each finger of the protein shown in thefirst and fourth (where applicable) rows, respectively. The third andsixth (where applicable) rows show the distribution of nucleotides inthe oligonucleotides obtained by site selection with the proteins shownin the first and fourth (where applicable) rows, respectively. FIG. 3Ashows data for fingers designed to bind GCA; FIG. 3B shows data forfingers designed to bind GAT; FIG. 3C shows data for fingers designed tobind GGT.

[0025]FIGS. 4A and 4B show properties of the engineered ZFP EP2C. FIG.4A shows site selection data. The first row provides the amino acidsequences of residues −1 through +6 of the recognition helices for eachof the three zinc fingers of the EP2C protein. The second row shows thetarget sequence (5′ to 3′); with the distribution of nucleotides in theoligonucleotides obtained by site selection indicated below the targetsequence.

[0026]FIG. 4B shows in vitro and in vivo assays for the bindingspecificity of EP2C. The first three columns show in vitro measurementsof binding affinity of EP2C to its intended target sequence and severalrelated sequences. The first column gives the name of each sequence (2C0is the intended target sequence, compare to FIG. 4A). The second columnshows the nucleotide sequence of various target sequences, withdifferences from the intended target sequence (2C0) highlighted. Thethird column shows the K_(d) (in nM) for binding of EP2C to each of thetarget sequences. K_(d)s were determined by gel shift assays, using2-fold dilution series of EP2C. The right side of the figure (fourthcolumn and bar graph) shows relative luciferase activities (normalizedto β-galactosidase levels) in stable cell lines in which expression ofEP2C is inducible. Cells were co-transfected with a vector containing aluciferase coding region under the transcriptional control of the targetsequence shown in the same row of the figure, and a control vectorencoding β-galactosidase. Luciferase and β-galactosidase levels weremeasured after induction of EP2C expression. Triplicate samples wereassayed and the standard deviations are shown in the bar graph. pGL3 isa luciferase-encoding vector lacking EP2C target sequences. 3B isanother negative control, in which luciferase expression is undertranscriptional control of sequences (3B) unrelated to the EP2C targetsequence.

DEFINITIONS

[0027] A zinc finger DNA binding protein is a protein or segment withina larger protein that binds DNA in a sequence-specific manner as aresult of stabilization of protein structure through coordination of azinc ion. The term zinc finger DNA binding protein is often abbreviatedas zinc finger protein or ZFP.

[0028] Zinc finger proteins can be engineered to recognize a selectedtarget sequence in a nucleic acid. Any method known in the art ordisclosed herein can be used to construct an engineered zinc fingerprotein or a nucleic acid encoding an engineered zinc finger protein.These include, but are not limited to, rational design, selectionmethods (e.g., phage display) random mutagenesis, combinatoriallibraries, computer design, affinity selection, use of databasesmatching zinc finger amino acid sequences with target subsite nucleotidesequences, cloning from cDNA and/or genomic libraries, and syntheticconstructions. An engineered zinc finger protein can comprise a newcombination of naturally-occurring zinc finger sequences. Methods forengineering zinc finger proteins are disclosed in co-owned WO 00/41566and WO 00/42219; as well as in WO 98/53057; WO 98/53058; WO 98/53059 andWO 98/53060; the disclosures of which are hereby incorporated byreference in their entireties. Methods for identifying preferred targetsequences, and for engineering zinc finger proteins to bind to suchpreferred target sequences, are disclosed in co-owned WO 00/42219.

[0029] A designed zinc finger protein is a protein not occurring innature whose design/composition results principally from rationalcriteria. Rational criteria for design include application ofsubstitution rules and computerized algorithms for processinginformation in a database storing information of existing ZFP designsand binding data.

[0030] A selected zinc finger protein is a protein not found in naturewhose production results primarily from an empirical process such asphage display.

[0031] The term naturally-occurring is used to describe an object thatcan be found in nature as distinct from being artificially produced byman. For example, a polypeptide or polynucleotide sequence that ispresent in an organism (including viruses) that can be isolated from asource in nature and which has not been intentionally modified by man inthe laboratory is naturally-occurring. Generally, the termnaturally-occurring refers to an object as present in a non-pathological(undiseased) individual, such as would be typical for the species.

[0032] A nucleic acid is operably linked when it is placed into afunctional relationship with another nucleic acid sequence. Forinstance, a promoter or enhancer is operably linked to a coding sequenceif it increases the transcription of the coding sequence. Operablylinked means that the DNA sequences being linked are typicallycontiguous and, where necessary to join two protein coding regions,contiguous and in reading frame. However, since enhancers generallyfunction when separated from the promoter by up to several kilobases ormore and intronic sequences may be of variable lengths, somepolynucleotide elements may be operably linked but not contiguous.

[0033] A specific binding affinity between, for example, a ZFP and aspecific target site means a binding affinity of at least 1×10⁶ M⁻¹.

[0034] The terms “modulating expression” “inhibiting expression” and“activating expression” of a gene refer to the ability of a zinc fingerprotein to activate or inhibit transcription of a gene. Activationincludes prevention of subsequent transcriptional inhibition (i.e.,prevention of repression of gene expression) and inhibition includesprevention of subsequent transcriptional activation (i.e., prevention ofgene activation). Modulation can be assayed by determining any parameterthat is indirectly or directly affected by the expression of the targetgene. Such parameters include, e.g., changes in RNA or protein levels,changes in protein activity, changes in product levels, changes indownstream gene expression, changes in reporter gene transcription(luciferase, CAT, beta-galactosidase, GFP (see, e.g., Mistili & Spector,Nature Biotechnology 15:961-964 (1997)); changes in signal transduction,phosphorylation and dephosphorylation, receptor-ligand interactions,second messenger concentrations (e.g., cGMP, cAMP, IP3, and Ca2+), cellgrowth, neovascularization, in vitro, in vivo, and ex vivo. Suchfunctional effects can be measured by any means known to those skilledin the art, e.g., measurement of RNA or protein levels, measurement ofRNA stability, identification of downstream or reporter gene expression,e.g., via chemiluminescence, fluorescence, calorimetric reactions,antibody binding, inducible markers, ligand binding assays; changes inintracellular second messengers such as cGMP and inositol triphosphate(IP3); changes in intracellular calcium levels; cytokine release, andthe like.

[0035] A “regulatory domain” refers to a protein or a proteinsubsequence that has transcriptional modulation activity. Typically, aregulatory domain is covalently or non-covalently linked to a ZFP tomodulate transcription. Alternatively, a ZFP can act alone, without aregulatory domain, or with multiple regulatory domains to modulatetranscription.

[0036] A D-able subsite within a target site has the motif 5′NNGK3′. Atarget site containing one or more such motifs is sometimes described asa D-able target site. A zinc finger appropriately designed to bind to aD-able subsite is sometimes referred to as a D-able finger. Likewise azinc finger protein containing at least one finger designed or selectedto bind to a target site including at least one D-able subsite issometimes referred to as a D-able zinc finger protein.

DETAILED DESCRIPTION

[0037] I. General

[0038] Tables 1-5 list a collection of nonnaturally occurring zincfinger protein sequences and their corresponding target sites. The firstcolumn of each table is an internal reference number. The second columnlists a 9 or 10 base target site bound by a three-finger zinc fingerprotein, with the target sites listed in 5′ to 3′ orientation. The thirdcolumn provides SEQ ID NOs for the target site sequences listed incolumn 2. The fourth, sixth and eighth columns list amino acid residuesfrom the first, second and third fingers, respectively, of a zinc fingerprotein which recognizes the target sequence listed in the secondcolumn. For each finger, seven amino acids, occupying positions −1 to +6of the finger, are listed. The numbering convention for zinc fingers isdefined below. Columns 5, 7 and 9 provide SEQ ID NOs for the amino acidsequences listed in columns 4, 6 and 8, respectively. The final columnof each table lists the binding affinity (i.e., the K_(d) in nM) of thezinc finger protein for its target site. Binding affinities are measuredas described below.

[0039] Each finger binds to a triplet of bases within a correspondingtarget sequence. The first finger binds to the first triplet startingfrom the 3′ end of a target site, the second finger binds to the secondtriplet, and the third finger binds the third (i.e., the 5′-most)triplet of the target sequence. For example, the RSDSLTS finger (SEQ IDNO:646) of SBS# 201 (Table 2) binds to 5′TTG3′, the ERSTLTR finger (SEQID NO:851) binds to 5′GCC3′ and the QRADLRR finger (SEQ ID NO:1056)binds to 5′GCA3′.

[0040] Table 6 lists a collection of consensus sequences for zincfingers and the target sites bound by such sequences. Conventional oneletter amino acid codes are used to designate amino acids occupyingconsensus positions. The symbol “X” designates a nonconsensus positionthat can in principle be occupied by any amino acid. In most zincfingers of the C₂H₂ type, binding specificity is principally conferredby residues −1, +2, +3 and +6. Accordingly, consensus sequencedetermining binding specificity typically include at least theseresidues. Consensus sequences are useful for designing zinc fingers tobind to a given target sequence. Residues occupying other positions canbe selected based on sequences in Tables 1-5, or other known zinc fingersequences. Alternatively, these positions can be randomized with aplurality of candidate amino acids and screened against one or moretarget sequences to refine binding specificity or improve bindingspecificity. In general, the same consensus sequence can be used fordesign of a zinc finger regardless of the relative position of thatfinger in a multi-finger zinc finger protein. For example, the sequenceRXDNXXR can be used to design a N-terminal, central or C-terminal fingerof three finger protein. However, some consensus sequences are mostsuitable for designing a zinc finger to occupy a particular position ina multi-finger protein. For example, the consensus sequence RXDHXXQ ismost suitable for designing a C-terminal finger of a three-fingerprotein.

[0041] II. Characteristics of Zinc Finger Proteins

[0042] Zinc finger proteins are formed from zinc finger components. Forexample, zinc finger proteins can have one to thirty-seven fingers,commonly having 2, 3, 4, 5 or 6 fingers. A zinc finger proteinrecognizes and binds to a target site (sometimes referred to as a targetsegment) that represents a relatively small subsequence within a targetgene. Each component finger of a zinc finger protein can bind to asubsite within the target site. The subsite includes a triplet of threecontiguous bases all on the same strand (sometimes referred to as thetarget strand). The subsite may or may not also include a fourth base onthe opposite strand that is the complement of the base immediately 3′ ofthe three contiguous bases on the target strand. In many zinc fingerproteins, a zinc finger binds to its triplet subsite substantiallyindependently of other fingers in the same zinc finger protein.Accordingly, the binding specificity of zinc finger protein containingmultiple fingers is usually approximately the aggregate of thespecificities of its component fingers. For example, if a zinc fingerprotein is formed from first, second and third fingers that individuallybind to triplets XXX, YYY, and ZZZ, the binding specificity of the zincfinger protein is 3′XXX YYY ZZZ5′.

[0043] The relative order of fingers in a zinc finger protein fromN-terminal to C-terminal determines the relative order of triplets inthe 3′ to 5′ direction in the target. For example, if a zinc fingerprotein comprises from N-terminal to C-terminal first, second and thirdfingers that individualy bind, respectively, to triplets 5′ GAC3′,5′GTA3′ and 5″GGC3′ then the zinc finger protein binds to the targetsegment 3′CAGATGCGG5′. If the zinc finger protein comprises the fingersin another order, for example, second finger, first finger, thirdfinger, then the zinc finger protein binds to a target segmentcomprising a different permutation of triplets, in this example,3′ATGCAGCGG5′ (see Berg & Shi, Science 271, 1081-1086 (1996)). Theassessment of binding properties of a zinc finger protein as theaggregate of its component fingers may, in some cases, be influenced bycontext-dependent interactions of multiple fingers binding in the sameprotein.

[0044] Two or more zinc finger proteins can be linked to have a targetspecificity that is the aggregate of that of the component zinc fingerproteins (see e.g., Kim & Pabo, PNAS 95, 2812-2817 (1998)). For example,a first zinc finger protein having first, second and third componentfingers that respectively bind to XXX, YYY and ZZZ can be linked to asecond zinc finger protein having first, second and third componentfingers with binding specificities, AAA, BBB and CCC. The bindingspecificity of the combined first and second proteins is thus3′XXXYYYZZZ_AAABBBCCC5′, where the underline indicates a shortintervening region (typically 0-5 bases of any type). In this situation,the target site can be viewed as comprising two target segmentsseparated by an intervening segment.

[0045] Linkage can be accomplished using any of the following peptidelinkers. T G E K P: (SEQ. ID. No:2) (Liu et al., 1997, supra.);(G4S)_(n) (SEQ. ID. No:3) (Kim et al., PNAS 93, 1156-1160 (1996.);GGRRGGGS; (SEQ. ID. No:4) LRQRDGERP; (SEQ. ID. No:5) LRQKDGGGSERP; (SEQ.ID. No:6) LRQKD(G3S)₂ ERP (SEQ. ID. No:7) Alternatively, flexiblelinkers can be rationally designed using computer programs capable ofmodeling both DNA-binding sites and the peptides themselves or by phagedisplay methods . In a further variation, noncovalent linkage can beachieved by fusing two zinc finger proteins with domains promotingheterodimer formation of the two zinc finger proteins. For example, onezinc finger protein can be fused with fos and the other with jun (seeBarbas et al., WO 95/119431).

[0046] Linkage of two zinc finger proteins is advantageous forconferring a unique binding specificity within a mammalian genome. Atypical mammalian diploid genome consists of 3×10⁹ bp. Assuming that thefour nucleotides A, C, G, and T are randomly distributed, a given 9 bpsequence is present ˜23,000 times. Thus a ZFP recognizing a 9 bp targetwith absolute specificity would have the potential to bind to ˜23,000sites within the genome. An 18 bp sequence is present once in 3.4×10¹⁰bp, or about once in a random DNA sequence whose complexity is ten timesthat of a mammalian genome.

[0047] A component finger of zinc finger protein typically containsabout 30 amino acids and has the following motif (N—C):

Cys-(X)₂₋₄-Cys-X.X.X.X.X.X.X.X.X.X.X.X.-His-(x)₃₋₅-His   (SEQ. ID. No.8)

[0048] The two invariant histidine residues and two invariant cysteineresidues in a single beta turn are co-ordinated through zinc (see, e.g.,Berg & Shi, Science 271, 1081-1085 (1996)). The above motif shows anumbering convention that is standard in the field for the region of azinc finger conferring binding specificity. The amino acid on the left(N-terminal side) of the first invariant His residues is assigned thenumber +6, and other amino acids further to the left are assignedsuccessively decreasing numbers. The alpha helix begins at residue 1 andextends to the residue following the second conserved histidine. Theentire helix is therefore of variable length, between 11 and 13residues.

[0049] The process of designing or selecting a nonmaturally occurring orvariant ZFP typically starts with a natural ZFP as a source of frameworkresidues. The process of design or selection serves to definenonconserved positions (i.e., positions −1 to +6) so as to confer adesired binding specificity. One suitable ZFP is the DNA binding domainof the mouse transcription factor Zif268. The DNA binding domain of thisprotein has the amino acid sequence:

[0050] YACPVESCDRRFSRSDELTRHIRIHTGQKP (F1) (SEQ. ID No:9)

[0051] FQCRICMRNFSRSDHLTTHIRTHTGEKP (F2) (SEQ. ID. No:10)

[0052] FACDICGRKFARSDERKRHTKIHLRQK (F3) SEQ. ID. No:11)

[0053] and binds to a target 5′ GCG TGG GCG 3′ (SEQ ID No:12).

[0054] Another suitable natural zinc finger protein as a source offramework residues is Sp-1. The Sp-1 sequence used for construction ofzinc finger proteins corresponds to amino acids 531 to 624 in the Sp-1transcription factor. This sequence is 94 amino acids in length. Theamino acid sequence of Sp-1 is as follows:

[0055] PGKKKQHICHIQGCGKVYGKTSHLRAHLRWHTGERP

[0056] FMCTWSYCGKRFTRSDELQRHKRTHTGEKK

[0057] FACPECPKRFMRSDHLSKHIKTHQNKKG (SEQ. ID. No:13)

[0058] Sp-1 binds to a target site 5′GGG GCG GGG3′ (SEQ ID No:14).

[0059] An alternate form of Sp-1, an Sp-1 consensus sequence, has thefollowing amino acid sequence:

[0060] meklngsgd

[0061] PGKKKQHACPECGKSFSKSSHLRAHQRTHTGERP

[0062] YKCPECGKSFSRSDELQRHQRTHTGEKP

[0063] YKCPECGKSFSRSDHLSKHQRTHQNKKG (SEQ. ID. No:15) (lower case lettersare a leader sequence from Shi & Berg, Chemistry and Biology 1, 83-89.(1995). The optimal binding sequence for the Sp-1 consensus sequence is5′GGGGCGGGG3′ (SEQ ID No:16). Other suitable ZFPs are described below.

[0064] There are a number of substitution rules that assist rationaldesign of some zinc finger proteins (see Desjarlais & Berg, PNAS 90,2256-2260 (1993); Choo & Klug, PNAS 91, 11163-11167 (1994); Desjarlais &Berg, PNAS 89, 7345-7349 (1992); Jamieson et al., supra; Choo et al., WO98/53057, WO 98/53058; WO 98/53059; WO 98/53060). Many of these rulesare supported by site-directed mutagenesis of the three-finger domain ofthe ubiquitous transcription factor, Sp-1 (Desjarlais and Berg, 1992;1993). One of these rules is that a 5′ G in a DNA triplet can be boundby a zinc finger incorporating arginine at position 6 of the recognitionhelix. Another substitution rule is that a G in the middle of a subsitecan be recognized by including a histidine residue at position 3 of azinc finger. A further substitution rule is that asparagine can beincorporated to recognize A in the middle of triplet, aspartic acid,glutamic acid, serine or threonine can be incorporated to recognize C inthe middle of triplet, and amino acids with small side chains such asalanine can be incorporated to recognize T in the middle of triplet. Afurther substitution rule is that the 3′ base of triplet subsite can berecognized by incorporating the following amino acids at position −1 ofthe recognition helix: arginine to recognize G, glutamine to recognizeA, glutamic acid (or aspartic acid) to recognize C, and threonine torecognize T. Although these substitution rules are useful in designingzinc finger proteins they do not take into account all possible targetsites. Furthermore, the assumption underlying the rules, namely that aparticular amino acid in a zinc finger is responsible for binding to aparticular base in a subsite is only approximate. Context-dependentinteractions between proximate amino acids in a finger or binding ofmultiple amino acids to a single base or vice versa can cause variationof the binding specificities predicted by the existing substitutionrules.

[0065] The technique of phage display provides a largely empirical meansof generating zinc finger proteins with a desired target specificity(see e.g., Rebar, U.S. Pat. No. 5,789,538; Choo et al., WO 96/06166;Barbas et al., WO 95/19431 and WO 98/543111; Jamieson et al., supra).The method can be used in conjunction with, or as an alternative torational design. The method involves the generation of diverse librariesof mutagenized zinc finger proteins, followed by the isolation ofproteins with desired DNA-binding properties using affinity selectionmethods. To use this method, the experimenter typically proceeds asfollows. First, a gene for a zinc finger protein is mutagenized tointroduce diversity into regions important for binding specificityand/or affinity. In a typical application, this is accomplished viarandomization of a single finger at positions −1, +2, +3, and +6, andsometimes accessory positions such as +1, +5, +8 and +10. Next, themutagenized gene is cloned into a phage or phagemid vector as a fusionwith gene III of a filamentous phage, which encodes the coat proteinpIII. The zinc finger gene is inserted between segments of gene IIIencoding the membrane export signal peptide and the remainder of pIII,so that the zinc finger protein is expressed as an amino-terminal fusionwith pIII or in the mature, processed protein. When using phagemidvectors, the mutagenized zinc finger gene may also be fused to atruncated version of gene III encoding, minimally, the C-terminal regionrequired for assembly of pill into the phage particle. The resultantvector library is transformed into E. coli and used to producefilamentous phage which express variant zinc finger proteins on theirsurface as fusions with the coat protein pIII. If a phagemid vector isused, then the this step requires superinfection with helper phage. Thephage library is then incubated with target DNA site, and affinityselection methods are used to isolate phage which bind target with highaffinity from bulk phage. Typically, the DNA target is immobilized on asolid support, which is then washed under conditions sufficient toremove all but the tightest binding phage. After washing, any phageremaining on the support are recovered via elution under conditionswhich disrupt zinc finger—DNA binding. Recovered phage are used toinfect fresh E. coli., which is then amplified and used to produce a newbatch of phage particles. Selection and amplification are then repeatedas many times as is necessary to enrich the phage pool for tight binderssuch that these may be identified using sequencing and/or screeningmethods. Although the method is illustrated for pIII fusions, analogousprinciples can be used to screen ZFP variants as pVIII fusions.

[0066] In certain embodiments, the sequence bound by a particular zincfinger protein is determined by conducting binding reactions (see, e.g.,conditions for determination of K_(d), infra) between the protein and apool of randomized double-stranded oligonucleotide sequences. Thebinding reaction is analyzed by an electrophoretic mobility shift assay(EMSA), in which protein-DNA complexes undergo retarded migration in agel and can be separated from unbound nucleic acid. Oligonucleotideswhich have bound the finger are purified from the gel and amplified, forexample, by a polymerase chain reaction. The selection (i.e. bindingreaction and EMSA analysis) is then repeated as many times as desired,with the selected oligonucleotide sequences. In this way, the bindingspecificity of a zinc finger protein having a particular amino acidsequence is determined.

[0067] Zinc finger proteins are often expressed with a heterologousdomain as fusion proteins. Common domains for addition to the ZFPinclude, e.g., transcription factor domains (activators, repressors,co-activators, co-repressors), silencers, oncogenes (e.g., myc, jun,fos, myb, max, mad, rel, ets, bcl, myb, mos family members etc.); DNArepair enzymes and their associated factors and modifiers; DNArearrangement enzymes and their associated factors and modifiers;chromatin associated proteins and their modifiers (e.g. kinases,acetylases and deacetylases); and DNA modifying enzymes (e.g.,methyltransferases, topoisomerases, helicases, ligases, kinases,phosphatases, polymerases, endonucleases) and their associated factorsand modifiers. A preferred domain for fusing with a ZFP when the ZFP isto be used for represssing expression of a target gene is a KRABrepression domain from the human KOX-1 protein (Thiesen et al., NewBiologist 2, 363-374 (1990); Margolin et al., Proc. Natl. Acad. Sci. USA91, 4509-4513 (1994); Pengue et al., Nucl. Acids Res. 22:2908-2914(1994); Witzgall et al., Proc. Natl. Acad. Sci. USA 91, 4514-4518(1994). Preferred domains for achieving activation include the HSV VP16activation domain (see, e.g., Hagmann et al., J. Virol. 71, 5952-5962(1997)) nuclear hormone receptors (see, e.g., Torchia et al., Curr.Opin. Cell. Biol. 10:373-383 (1998)); the p65 subunit of nuclear factorkappa B (Bitko & Barik, J. Virol. 72:5610-5618 (1998)and Doyle & Hunt,Neuroreport 8:2937-2942 (1997)); Liu et al., Cancer Gene Ther. 5:3-28(1998)), or artificial chimeric functional domains such as VP64 (Seifpalet al., EMBO J. 11, 4961-4968 (1992)).

[0068] An important factor in the administration of polypeptidecompounds, such as the ZFPs, is ensuring that the polypeptide has theability to traverse the plasma membrane of a cell, or the membrane of anintra-cellular compartment such as the nucleus. Cellular membranes arecomposed of lipid-protein bilayers that are freely permeable to small,nonionic lipophilic compounds and are inherently impermeable to polarcompounds, macromolecules, and therapeutic or diagnostic agents.However, proteins and other compounds such as liposomes have beendescribed, which have the ability to translocate polypeptides such asZFPs across a cell membrane.

[0069] For example, “membrane translocation polypeptides” haveamphiphilic or hydrophobic amino acid subsequences that have the abilityto act as membrane-translocating carriers. In one embodiment,homeodomain proteins have the ability to translocate across cellmembranes. The shortest internalizable peptide of a homeodomain protein,Antennapedia, was found to be the third helix of the protein, from aminoacid position 43 to 58 (see, e.g., Prochiantz, Current Opinion inNeurobiology 6:629-634 (1996)). Another subsequence, the h (hydrophobic)domain of signal peptides, was found to have similar cell membranetranslocation characteristics (see, e.g., Lin et al., J. Biol. Chem.270:14255-14258 (1995)).

[0070] Examples of peptide sequences which can be linked to a ZFP, forfacilitating uptake of ZFP into cells, include, but are not limited to:an 11 amino acid peptide of the tat protein of HIV; a 20 residue peptidesequence which corresponds to amino acids 84-103 of the p16 protein (seeFahraeus et al., Current Biology 6:84 (1996)); the third helix of the60-amino acid long homeodomain of Antennapedia (Derossi et al., J. Biol.Chem. 269:10444 (1994)); the h region of a signal peptide such as theKaposi fibroblast growth factor (K-FGF) h region (Lin et al., supra); orthe VP22 translocation domain from HSV (Elliot & O'Hare, Cell 88:223-233(1997)). Other suitable chemical moieties that provide enhanced cellularuptake may also be chemically linked to ZFPs.

[0071] Toxin molecules also have the ability to transport polypeptidesacross cell membranes. Often, such molecules are composed of at leasttwo parts (called “binary toxins”): a translocation or binding domain orpolypeptide and a separate toxin domain or polypeptide. Typically, thetranslocation domain or polypeptide binds to a cellular receptor, andthen the toxin is transported into the cell. Several bacterial toxins,including Clostridium perfringens iota toxin, diphtheria toxin (DT),Pseudomonas exotoxin A (PE), pertussis toxin (PT), Bacillus anthracistoxin, and pertussis adenylate cyclase (CYA), have been used in attemptsto deliver peptides to the cell cytosol as internal or amino-terminalfusions (Arora et al., J. Biol. Chem., 268:3334-3341 (1993); Perelle etal., Infect. Immun., 61:5147-5156 (1993); Stenmark et al., J. Cell Biol.113:1025-1032 (1991); Donnelly et al., PNAS 90:3530-3534 (1993);Carbonetti et al., Abstr. Annu. Meet. Am. Soc. Microbiol. 95:295 (1995);Sebo et al., Infect. Immun. 63:3851-3857 (1995); Klimpel et al., PNASU.S.A. 89:10277-10281 (1992); and Novak et al., J. Biol. Chem.267:17186-17193 1992)).

[0072] Such subsequences can be used to translocate ZFPs across a cellmembrane. ZFPs can be conveniently fused to or derivatized with suchsequences. Typically, the translocation sequence is provided as part ofa fusion protein. Optionally, a linker can be used to link the ZFP andthe translocation sequence. Any suitable linker can be used, e.g., apeptide linker.

[0073] III. Position Dependence of Subsite Recognition by Zinc Fingers

[0074] A number of the polypeptides disclosed herein have beencharacterized using the methods disclosed in parent application Ser. No.09/716,637 (the disclosure of which is hereby incorporated by referencein its entirety); in particular with respect to the effect of theirposition, within a multi-finger protein, on their sequence specificity.The results of these investigations provide a set of zinc fingersequences that are optimized for recognition of certain triplet targetsubsites whose 5′-most nucleotide is a G (i.e., GNN triplet subsites).Thus, particular zinc finger sequences which recognize each of the GNNtriplet subsites, from each position of a three-finger zinc fingerprotein, are provided. See FIG. 2. It will be clear to those of skill inthe art that the optimized, position-specific zinc finger sequencesdisclosed herein for recognition of GNN target subsites are not limitedto use in three-finger proteins. For example, they are also useful insix-finger proteins, which can be made by linkage of two three-fingerproteins.

[0075] A number of zinc finger amino acid sequences which are reportedto bind to target subsites in which the 5′-most nucleotide residue is G(i.e., GNN subsites) have recently been disclosed. Segal et al. (1999)Proc. Natl. Acad. Sci. USA 96:2758-2763; Drier et al. (2000) J. Mol.Biol. 303:489-502; U.S. Pat. No. 6,140,081. These GNN-binding zincfingers were obtained by selection of finger 2 sequences from phagedisplay libraries of three-finger proteins, in which certain amino acidresidues of finger 2 had been randomized. Due to the manner in whichthey were selected, it is not clear whether these sequences would havethe same target subsite specificity if they were present in the F1and/or F3 positions.

[0076] Use of the methods and compositions disclosed herein has nowallowed identification of specific zinc finger sequences that bind eachof the 16 GNN triplet subsites, and for the first time, provides zincfinger sequences that are optimized for recognition of these tripletsubsites in a position-dependent fashion. Moreover, in vivo studies ofthese optimized designs reveal that the functionality of a ZFP iscorrelated with its binding affinity to its target sequence. See Example6, infra.

[0077] As a result of the discovery, disclosed herein, that sequencerecognition by zinc fingers is position-dependent, it is clear thatexisting design rules will not, in and of themselves, be applicable toevery situation in which it is necessary to construct asequence-specific ZFP. The results disclosed herein show that many zincfingers that are constructed based on design rules exhibit the sequencespecificity predicted by those design rules only at certain fingerpositions. The position-specific zinc fingers disclosed herein arelikely to function more efficiently in vivo and in cultured cells, withfewer nonspecific effects. Highly specific ZFPs, made usingposition-specific zinc fingers, will be useful tools in studying genefunction and will find broad applications in areas as diverse as humantherapeutics and plant engineering.

[0078] IV. Production of Zinc Finger Proteins

[0079] ZFP polypeptides and nucleic acids encoding the same can be madeusing routine techniques in the field of recombinant genetics. Basictexts disclosing the general methods include Sambrook et al., MolecularCloning, A Laboratory Manual (2nd ed. 1989); Kriegler, Gene Transfer andExpression: A Laboratory Manual (1990); and Current Protocols inMolecular Biology (Ausubel et al., eds., 1994)). In addition, nucleicacids less than about 100 bases can be custom ordered from any of avariety of commercial sources, such as The Midland Certified ReagentCompany (mcrc@oligos.com), The Great American Gene Company(http://www.genco.com), ExpressGen Inc. (www.expressgen.com), OperonTechnologies Inc. (Alameda, Calif.). Similarly, peptides can be customordered from any of a variety of sources, such as PeptidoGenic(pkim@ccnet.com), HTI Bio-products, inc. (http://www.htibio.com), BMABiomedicals Ltd (U.K.), Bio.Synthesis, Inc.

[0080] Oligonucleotides can be chemically synthesized according to thesolid phase phosphoramidite triester method first described by Beaucage& Caruthers, Tetrahedron Letts. 22:1859-1862 (1981), using an automatedsynthesizer, as described in Van Devanter et al., Nucleic Acids Res.12:6159-6168 (1984). Purification of oligonucleotides is by eitherdenaturing polyacrylamide gel electrophoresis or by reverse phase HPLC.The sequence of the cloned genes and synthetic oligonucleotides can beverified after cloning using, e.g., the chain termination method forsequencing double-stranded templates of Wallace et al., Gene 16:21-26(1981).

[0081] Two alternative methods are typically used to create the codingsequences required to express newly designed DNA-binding peptides. Oneprotocol is a PCR-based assembly procedure that utilizes six overlappingoligonucleotides (FIG. 1). Three oligonucleotides (oligos 1, 3, and 5 inFIG. 1) correspond to “universal” sequences that encode portions of theDNA-binding domain between the recognition helices. Theseoligonucleotides typically remain constant for all zinc fingerconstructs. The other three “specific” oligonucleotides (oligos 2, 4,and 6 in FIG. 1) are designed to encode the recognition helices. Theseoligonucleotides contain substitutions primarily at positions -1, 2, 3and 6 on the recognition helices making them specific for each of thedifferent DNA-binding domains.

[0082] The PCR synthesis is carried out in two steps. First, a doublestranded DNA template is created by combining the six oligonucleotides(three universal, three specific) in a four cycle PCR reaction with alow temperature annealing step, thereby annealing the oligonucleotidesto form a DNA “scaffold.” The gaps in the scaffold are filled in byhigh-fidelity thermostable polymerase, the combination of Taq and Pfupolymerases also suffices. In the second phase of construction, the zincfinger template is amplified by external primers designed to incorporaterestriction sites at either end for cloning into a shuttle vector ordirectly into an expression vector.

[0083] An alternative method of cloning the newly designed DNA-bindingproteins relies on annealing complementary oligonucleotides encoding thespecific regions of the desired ZFP. This particular applicationrequires that the oligonucleotides be phosphorylated prior to the finalligation step. This is usually performed before setting up the annealingreactions. In brief, the “universal” oligonucleotides encoding theconstant regions of the proteins (oligos 1, 2 and 3 of above) areannealed with their complementary oligonucleotides. Additionally, the“specific” oligonucleotides encoding the finger recognition helices areannealed with their respective complementary oligonucleotides. Thesecomplementary oligos are designed to fill in the region which waspreviously filled in by polymerase in the above-mentioned protocol. Thecomplementary oligos to the common oligos 1 and finger 3 are engineeredto leave overhanging sequences specific for the restriction sites usedin cloning into the vector of choice in the following step. The secondassembly protocol differs from the initial protocol in the followingaspects: the “scaffold” encoding the newly designed ZFP is composedentirely of synthetic DNA thereby eliminating the polymerase fill-instep, additionally the fragment to be cloned into the vector does notrequire amplification. Lastly, the design of leaving sequence-specificoverhangs eliminates the need for restriction enzyme digests of theinserting fragment. Alternatively, changes to ZFP recognition helicescan be created using conventional site-directed mutagenesis methods.

[0084] Both assembly methods require that the resulting fragmentencoding the newly designed ZFP be ligated into a vector. Ultimately,the ZFP-encoding sequence is cloned into an expression vector.Expression vectors that are commonly utilized include, but are notlimited to, a modified pMAL-c2 bacterial expression vector (New EnglandBioLabs or an eukaryotic expression vector, pcDNA (Promega). The finalconstructs are verified by sequence analysis.

[0085] Any suitable method of protein purification known to those ofskill in the art can be used to purify ZFPs (see, Ausubel, supra,Sambrook, supra). In addition, any suitable host can be used forexpression, e.g., bacterial cells, insect cells, yeast cells, mammaliancells, and the like.

[0086] Expression of a zinc finger protein fused to a maltose bindingprotein (MBP-ZFP) in bacterial strain JM109 allows for straightforwardpurification through an amylose column (NEB). High expression levels ofthe zinc finger chimeric protein can be obtained by induction with IPTGsince the MBP-ZFP fusion in the pMal-c2 expression plasmid is under thecontrol of the tac promoter (NEB). Bacteria containing the MBP-ZFPfusion plasmids are inoculated into 2×YT medium containing 10 μM ZnCl2,0.02% glucose, plus 50 μg/ml ampicillin and shaken at 37° C. Atmid-exponential growth IPTG is added to 0.3 mM and the cultures areallowed to shake. After 3 hours the bacteria are harvested bycentrifugation, disrupted by sonication or by passage through a frenchpressure cell or through the use of lysozyme, and insoluble material isremoved by centrifugation. The MBP-ZFP proteins are captured on anamylose-bound resin, washed extensively with buffer containing 20 mMTris-HCl (pH 7.5), 200 mM NaCl, 5 mM DTT and 50 μM ZnC12, then elutedwith maltose in essentially the same buffer (purification is based on astandard protocol from NEB). Purified proteins are quantitated andstored for biochemical analysis.

[0087] The dissociation constants of the purified proteins, e.g., Kd,are typically characterized via electrophoretic mobility shift assays(EMSA) (Buratowski & Chodosh, in Current Protocols in Molecular Biologypp. 12.2.1-12.2.7 (Ausubel ed., 1996)). Affinity is measured bytitrating purified protein against a fixed amount of labeleddouble-stranded oligonucleotide target. The target typically comprisesthe natural binding site sequence flanked by the 3 bp found in thenatural sequence and additional, constant flanking sequences. Thenatural binding site is typically 9 bp for a three-finger protein and2×9 bp+intervening bases for a six finger ZFP. The annealedoligonucleotide targets possess a 1 base 5′ overhang which allows forefficient labeling of the target with T4 phage polynucleotide kinase.For the assay the target is added at a concentration of 1 nM or lower(the actual concentration is kept at least 10-fold lower than theexpected dissociation constant), purified ZFPs are added at variousconcentrations, and the reaction is allowed to equilibrate for at least45 min. In addition the reaction mixture also contains 10 mM Tris (pH7.5), 100 mM KCl, 1 mM MgCl2, 0.1 mM ZnCl2, 5 mM DTT, 10% glycerol,0.02% BSA. B: in earlier assays poly d(IC) was also added at 10-100μg/μl.)

[0088] The equilibrated reactions are loaded onto a 10% polyacrylamidegel, which has been pre-run for 45 min in Tris/glycine buffer, thenbound and unbound labeled target is resolved by electrophoresis at 150V.(alternatively, 10-20% gradient Tris-HCl gels, containing a 4%polyacrylamide stacker, can be used) The dried gels are visualized byautoradiography or phosphorimaging and the apparent Kd is determined bycalculating the protein concentration that gives half-maximal binding.

[0089] The assays can also include determining active fractions in theprotein preparations. Active fractions are determined by stoichiometricgel shifts where proteins are titrated against a high concentration oftarget DNA. Titrations are done at 100, 50, and 25% of target (usuallyat micromolar levels).

[0090] V. Applications of Engineered Zinc Finger Proteins

[0091] ZPFs that bind to a particular target gene, and the nucleic acidsencoding them, can be used for a variety of applications. Theseapplications include therapeutic methods in which a ZFP or a nucleicacid encoding it is administered to a subject and used to modulate theexpression of a target gene within the subject. See, for example,co-owned WO 00/41566. The modulation can be in the form of repression,for example, when the target gene resides in a pathological infectingmicrorganisms, or in an endogenous gene of the patient, such as anoncogene or viral receptor, that is contributing to a disease state.Alternatively, the modulation can be in the form of activation whenactivation of expression or increased expression of an endogenouscellular gene can ameliorate a diseased state. For such applications,ZFPs, or more typically, nucleic acids encoding them are formulated witha pharmaceutically acceptable carrier as a pharmaceutical composition.

[0092] Pharmaceutically acceptable carriers are determined in part bythe particular composition being administered, as well as by theparticular method used to administer the composition. (see, e.g.,Remington's Pharmaceutical Sciences, 17^(th) ed. 1985)). The ZFPs, aloneor in combination with other suitable components, can be made intoaerosol formulations (i.e., they can be “nebulized”) to be administeredvia inhalation. Aerosol formulations can be placed into pressurizedacceptable propellants, such as dichlorodifluoromethane, propane,nitrogen, and the like. Formulations suitable for parenteraladministration, such as, for example, by intravenous, intramuscular,intradermal, and subcutaneous routes, include aqueous and non-aqueous,isotonic sterile injection solutions, which can contain antioxidants,buffers, bacteriostats, and solutes that render the formulation isotonicwith the blood of the intended recipient, and aqueous and non-aqueoussterile suspensions that can include suspending agents, solubilizers,thickening agents, stabilizers, and preservatives. Compositions can beadministered, for example, by intravenous infusion, orally, topically,intraperitoneally, intravesically or intrathecally. The formulations ofcompounds can be presented in unit-dose or multi-dose sealed containers,such as ampules and vials. Injection solutions and suspensions can beprepared from sterile powders, granules, and tablets of the kindpreviously described.

[0093] The dose administered to a patient should be sufficient to effecta beneficial therapeutic response in the patient over time. The dose isdetermined by the efficacy and K_(d) of the particular ZFP employed, thetarget cell, and the condition of the patient, as well as the bodyweight or surface area of the patient to be treated. The size of thedose also is determined by the existence, nature, and extent of anyadverse side-effects that accompany the administration of a particularcompound or vector in a particular patient

[0094] In other applications, ZFPs are used in diagnostic methods forsequence specific detection of target nucleic acid in a sample. Forexample, ZFPs can be used to detect variant alleles associated with adisease or phenotype in patient samples. As an example, ZFPs can be usedto detect the presence of particular mRNA species or cDNA in a complexmixtures of mRNAs or cDNAs. As a further example, ZFPs can be used toquantify copy number of a gene in a sample. For example, detection ofloss of one copy of a p53 gene in a clinical sample is an indicator ofsusceptibility to cancer. In a further example, ZFPs are used to detectthe presence of pathological microorganisms in clinical samples. This isachieved by using one or more ZFPs specific to genes within themicroorganism to be detected. A suitable format for performingdiagnostic assays employs ZFPs linked to a domain that allowsimmobilization of the ZFP on an ELISA plate. The immobilized ZFP iscontacted with a sample suspected of containing a target nucleic acidunder conditions in which binding can occur. Typically, nucleic acids inthe sample are labeled (e.g., in the course of PCR amplification).Alternatively, unlabelled probes can be detected using a second labelledprobe. After washing, bound-labelled nucleic acids are detected.

[0095] ZFPs also can be used for assays to determine the phenotype andfunction of gene expression. Current methodologies for determination ofgene function rely primarily upon either overexpression or removing(knocking out completely) the gene of interest from its naturalbiological setting and observing the effects. The phenotypic effectsobserved indicate the role of the gene in the biological system.

[0096] One advantage of ZFP-mediated regulation of a gene relative toconventional knockout analysis is that expression of the ZFP can beplaced under small molecule control. By controlling expression levels ofthe ZFPs, one can in turn control the expression levels of a generegulated by the ZFP to determine what degree of repression orstimulation of expression is required to achieve a given phenotypic orbiochemical effect. This approach has particular value for drugdevelopment. By putting the ZFP under small molecule control, problemsof embryonic lethality and developmental compensation can be avoided byswitching on the ZFP repressor at a later stage in mouse development andobserving the effects in the adult animal. Transgenic mice having targetgenes regulated by a ZFP can be produced by integration of the nucleicacid encoding the ZFP at any site in trans to the target gene.Accordingly, homologous recombination is not required for integration ofthe nucleic acid. Further, because the ZFP is trans-dominant, only onechromosomal copy is needed and therefore functional knock-out animalscan be produced without backcrossing.

[0097] All references cited above are hereby incorporated by referencein their entirety for all purposes.

EXAMPLES Example 1 Initial Design of Zinc Finger Proteins andDetermination of Binding Affinity

[0098] Initial ZFP designs were based on existing design rules,correspondence regimes and ZFP directories, including those disclosedherein (see Tables 1-5) and also in WO 98/53058; WO 98/530059; WO98/53060 and co-owned U.S. patent application Ser. No. 09/444,241. Seealso WO 00/42219. Amino acid sequences were conceptually designed usingamino acids 532-624 of the human transcription factor Sp1 as a backbone.Polynucleotides encoding designed ZFPs were assembled using a PolymeraseChain Reaction (PCR)-based procedure that utilizes six overlappingoligonucleotides. PCR products were directly cloned cloning into the Tacpromoter vector, pMal-c2 (New England Biolabs, Beverly, Mass.) using theKpnI and BamHI restriction sites. The encoded maltose bindingprotein-ZFP fusion polypeptides were purified according to themanufacturer's procedures (New England Biolabs, Beverly, Mass.). Bindingaffinity was measured by gel mobility-shift analysis. All of theseprocedures are described in detail in co-owned WO 00/41566 and WO00/42219, as well as in Zhang et al. (2000) J. Biol. Chem.275:33,850-33,860 and Liu et al. (2001) J. Biol. Chem.276:11,323-11,334; the disclosures of which are hereby incorporated byreference in their entireties.

Example 2 Optimization of Binding Specificity by Site Selection

[0099] Designed ZFPs were tested for binding specificity using siteselection methods disclosed in parent application U.S. Ser. No.09/716,637. Briefly, designed proteins were incubated with a populationof labeled, double-stranded oligonucleotides comprising a library of allpossible 9- or 10-nucleotide target sequences. Five nanomoles of labeledoligonucleotides were incubated with protein, at a protein concentration4-fold above its K_(d) for its target sequence. The mixture wassubjected to gel electrophoresis, and bound oligonucleotides wereidentified by mobility shift, and extracted from the gel. The purifiedbound oligonucleotides were amplified, and the amplification productswere used for a subsequent round of selection. At each round ofselection, the protein concentration was decreased by 2 fold. After 3-5rounds of selection, amplification products were cloned into the TOPO TAcloning vector (Invitrogen, Carlsbad, Calif.), and the nucleotidesequences of approximately 20 clones were determined. The identities ofthe target sites bound by a designed protein were determined from thesequences and expressed as a compilation of subsite binding sequences.

Example 3 Comparison of Site Selection Results With Binding Affinity

[0100] To test the correlation between site selection results and theaffinity of binding of a ZFP to various related targets, site selectionexperiments were conducted on 2 three-finger ZFPs, denoted ZFP1 andZFP2, and the site selection results were compared with K_(d)measurements obtained from quantitative gel-mobility shift assays usingthe same ZFPs and target sites. Each ZFP was constructed, based ondesign rules, to bind to a particular nine-nucleotide target sequence(comprising 3 three-nucleotide subsites), as shown in FIG. 1. Siteselection results and affinity measurements are also shown in FIG. 1.The site selection results showed that fingers 1 and 3 of both the ZFP1and ZFP2 proteins preferentially selected their intended targetsequences. However, the second finger of each ZFP preferentiallyselected subsites other than those to which they were designed to bind(e.g., F2 of ZFP1 was designed to bind TCG, but preferentially selectedGTG; F2 of ZFP2 was designed to bind GGT, but preferentially selectedGGA).

[0101] To confirm the site selection results, binding affinities of ZFP1and ZFP2 were measured (see Example 1, supra), both to their originaltarget sequences and to new target sequences reflecting the siteselection results. For example, the Mt-1 sequence contains two basechanges (compared to the original target sequence for ZFP1) which resultin a change in the sequence of the finger 2 subsite to GTG, reflectingthe preferred finger 2 subsite sequence obtained by site selection. Inagreement with the site selection results, binding of ZFP1 to the Mt-1sequence is approximately 4-fold stronger than its binding to theoriginal target sequence (K_(d) of 12.5 nM compared to a K_(d) of 50 nM,see FIG. 1).

[0102] For ZFP2, the specificity of finger 2 for the 3′ base of itstarget subsite was tested, since, although this finger was designed tobind GGT, site selection indicated that it bound preferentially to GGA.Moreover, the site selection results predicted that finger 2 of ZFP2would bind with approximately equal affinity to GGT and GGC.Accordingly, target sequences containing GGA (Mt-3) and GGC (Mt-4) atthe finger 2 subsite were constructed, and binding affinities of ZFP2 tothese target sequences, and to its original target sequence (containingGGT at the finger 2 subsite), were compared. In complete agreement withthe site selection results, ZFP2 exhibited the strongest bindingaffinity for the target sequence containing GGA at the finger 2 subsite(K_(d) of 0.5 nM, FIG. 1), and its affinity for target sequencescontaining either GGT or GGC at the finger 2 subsite was approximatelyequal (K_(d) of 1 nM for both targets, FIG. 1). Accordingly, the siteselection method, in addition to being useful for iterative optimizationof binding specificity, can also be used as a useful indicator ofbinding affinity.

Example 4 Use of Site Selection to Identify Position-Dependent,GNN-Binding Zinc Fingers

[0103] A large number of engineered ZFPs have been evaluated, by siteselection, to identify zinc fingers that bind to GNN target subsites. Inthe course of these studies, it became apparent that the bindingspecificity of a particular zinc finger sequence is, in some instances,dependent upon the position of the zinc finger in the protein, and henceupon the location of the target subsite within the target sequence. Forexample, if one wishes to design a three-finger zinc finger protein tobind to a target sequence containing the triplet subsite GAT, it isnecessary to know whether this subsite is the first, second or thirdsubsite in the target sequence (i.e., whether the GAT subsite will bebound by the first, second or third finger of the protein). Accordingly,over 110 three-finger zinc finger proteins, containing potentialGNN-recognizing zinc fingers in various locations, have been evaluatedby site selection experiments. Generally, several zinc finger sequenceswere designed to recognize each GNN triplet, and each design was testedin each of the F1, F2 and F3 positions through 4 to 6 rounds ofselection.

[0104] The results of these analyses, shown in FIG. 2, provide optimalposition-dependent zinc finger sequences (the sequences shown representamino acid residues −1 through +6 of the recognition helix portion ofthe finger) for recognition of the 16 GNN target subsites, as well assite selection results for these GNN-specific zinc fingers. Optimalamino acid sequences for recognition of each GNN subsite from each ofthree positions (finger 1, finger 2 or finger 3) are thereby provided.

[0105] GNG-Binding Finger Designs

[0106] The amino acid sequence RSDXLXR (position −1 to +6 of therecognition helix) was found to be optimal for binding to the four GNGtriplets, with Asn⁺³ specifying A as the middle nucleotide; His⁺³specifying G as the middle nucleotide; Ala⁺³ specifying T as the middlenucleotide; and Asp⁺³ specifying cytosine as the middle nucleotide. Atthe +5 position, Ala, Thr, Ser, and Gln, were tested, and all showedsimilar specificity profiles by site selection. Interestingly, and incontrast to a previous report (Swirnoff et al. (1995) Mol. Cell. Biol.15:2275-2287), site selection results indicated that threenaturally-occurring GCG-binding fingers from zif268 and Sp1, having theamino acid sequences RSDELTR, RSDELQR, and RSDERKR, were notGCG-specific. Rather, each of these fingers selected almost equalnumbers of GCG and GTG sequences. Analysis of binding affinity bygel-shift experiments confirmed that finger 3 of zif268, having thesequence RSDERKR, binds GCG and GTG with approximately equal affinity.

[0107] Position Dependence of GCA-, GAT-, GGT-, GAA- and GCC-BindingFingers

[0108] Based on existing design rules, the amino acid sequence QSGDLTR(−1 through +6) was tested for its ability to bind the GCA triplet fromthree positions (F1, F2, and F3) within a three-finger ZFP. FIG. 3Ashows that the QSGDLTR sequence bound preferentially to the GCA tripletsubsite from the F2 and F3 positions, but not from F1. In fact, thepresence of QSGDLTR at the F1 position of three different three-fingerZFPs resulted predominantly in selection of GCT. Accordingly, an attemptwas made to redesign this sequence to obtain specificity for GCA fromthe F1 position. Since the sequence Q⁻¹G⁺²S⁺³R⁺⁶ had previously beenselected from a randomized F1 library using GCA as target (Rebar et al.(1994) Science 263:671-673), a D (asp) to S (ser) change was made at the+3 residue of this finger. The resulting sequence, QSGSLTR, was testedfor its binding specificity by site selection and found topreferentially bind GCA, from the F1 position, in three different ZFPs(see FIG. 2).

[0109] The QSGSLTR zinc finger, optimized for recognition of the GCAsubsite from the F1 position, was tested for its selectivity whenlocated at the F2 position. Accordingly, two ZFPs, one containingQSGSLTR at finger 2 and one containing QSGDLTR at finger 2 (both havingidentical F1 sequences and identical F3 sequences) were tested by siteselection. The results indicated that, when used at the F2 position,QSGSLTR bound preferentially to GTA, rather than GCA. Thus, for optimalbinding of a GCA triplet subsite from the F1 position, the amino acidsequence QSGSLTR is required; while, for optimal binding of the samesubsite sequence from F2 or F3, QSGDLTR should be used. Accordingly,different zinc finger amino acid sequences may be needed to specify aparticular triplet subsite sequence, depending upon the location of thesubsite within the target sequence and, hence, upon the position of thefinger in the protein.

[0110] Positional effects were also observed for zinc fingersrecognizing GAT and GGT subsites. The zinc finger amino acid sequenceQSSNLAR (−1 through +6) is expected to bind to GAT, based on designrules. However, this sequence selected GAT only from the F1 position,and not from the F2 and F3 positions, from which the sequence GAA waspreferentially bound (FIG. 3B). Similarly, the amino acid sequenceQSSHLTR which, based on design rules, should bind GGT, selected GGT atthe F1 position, but not at the F2 and F3 positions, from which itpreferentially bound GGA (FIG. 3C). Conversely, the amino acid sequenceTSGHLVR has previously been disclosed to recognize the triplet GGT,based on its selection from a randomized library of zif268 finger 2.U.S. Pat. No. 6,140,081. However, TSGHLVR was not specific for the GGTsubsite when located at the F1 position (FIG. 3C). These resultsindicate that the binding specificity of many fingers is positiondependent, and particularly point out that the sequence specificity of azinc finger selected from a F2 library may be positionally limited.

[0111] The results shown in FIG. 2 indicate that recognition of at leastGAA and GCC triplets by zinc fingers is also position dependent.

[0112] These positional dependences stand in contrast to earlierpublished work, which suggested that zinc fingers behaved as independentmodules with respect to the sequence specificity of their binding toDNA. Desjarlais et al. (1993) Proc. Natl. Acad. Sci. USA 90:2256-2260.

Example 5 Characterization of EP2C

[0113] The engineered zinc finger protein EP2C binds to a targetsequence, GCGGTGGCT with a dissociation constant (K_(d)) of 2 nM. Siteselection results indicated that fingers 1 and 2 are highly specific fortheir target subsites, while finger 3 selects GCG (its intended targetsubsite) and GTG at approximately equal frequencies (FIG. 4A). Toconfirm these observations, the binding affinities of EP2C to itscognate target sequence, and to variant target sequences, was measuredby standard gel-shift analyses (see Example 1, supra). As standards forcomparison, the binding affinities of Sp1 and zif268 to their respectivetargets were also measured under the same conditions, and weredetermined to be 40 nM for SP1 (target sequence GGGGCGGGG) and 2 nM forzif268 (target sequence GCGTGGGCG). Measurements of binding affinitiesconfirmed that F3 of EP2C bound GTG and GCG equally well (K_(d)s of 2nM), but bound GAG with a two-fold lower affinity (FIG. 4B). Finger 2was very specific for the GTG triplet, binding 15-fold less tightly to aGGG triplet (compare 2C0 and 2C3 in FIG. 4B). Finger 1 was also veryspecific for the GCT triplet, it bound with 4-fold lower affinity to aGAT triplet (2C4) and with 2-fold lower affinity to a GCG triplet (2C5).This example shows, once again, the high degree of correlation betweensite selection results and binding affinities.

Example 6 Evaluation of Engineered ZFPs by in vivo Functional Assays

[0114] To determine whether a correlation exists between the bindingaffinity of a engineered ZFP to its target sequence and itsfunctionality in vivo, cell-based reporter gene assays were used toanalyze the functional properties of the engineered ZFP EP2C (seeExample 5, supra). For these assays, a plasmid encoding the EP2C ZFP,fused to a VP16 transcriptional activation domain, was used to constructa stable cell line (T-Rex-293™, Invitrogen, Carlsbad, Calif.) in whichexpression of EP2C-VP16 is inducible, as described in Zhang et al.,supra. To generate reporter constructs, three tandem copies of the EP2Ctarget site, or its variants (see FIG. 4B, column 2), were insertedbetween the Mlu I and BglII sites of the pGL3 luciferase-encoding vector(Promega, Madison, Wis.), upstream of the SV40 promoter. Structures ofall reporter constructs were confirmed by DNA sequencing.

[0115] Luciferase reporter assays were performed by co-transfection ofluciferase reporter construct (200 ng) and pCMV-βgal (100 ng, used as aninternal control) into the EP2C cells seeded in 6-well plates.Expression of the EP2C-VP16 transcriptional activator was induced withdoxycycline (0.05 ug/ml) 24 h after transfection of reporter constructs.Cell lysates were harvested 40 hours post-transfection, luciferase andβ-galactosidase activities were measured by the Dual-Light ReporterAssay System (Tropix, Bedford, Mass.), and luciferase activities werenormalized to the co-transfected β-galactosidase activities. Theresults, shown on the right side of FIG. 4B, showed that the normalizedluciferase activity for each reporter construct was well correlated withthe in vitro binding affinity of EP2C to the target sequence present inthe construct. For example, the target sequences to which EP2C boundwith greatest affinity (2C0 and 2C2, K_(d) of 2 nM for each) bothstimulated the highest levels of luciferase activity, when used to driveluciferase expression in the reporter construct (FIG. 4B). Targetsequences to which EP2C bound with 2-fold lower affinity, 2C1 and 2C5(K_(d) of 4 nM for each), stimulated roughly half the luciferaseactivity of the 2C0 and 2C2 targets. The 2C3 and 2C4 sequences, forwhich EP2C showed the lowest in vitro binding affinities, also yieldedthe lowest levels of in vivo activity when used to drive luciferaseexpression. Target 3B, a sequence to which EP2C does not bind, yieldedbackground levels of luciferase activity, similar to those obtained witha luciferase-encoding vector lacking EP2C target sequences (pGL3). Thusthere exist good correlations between binding affinity (as determined byK_(d) measurement), binding specificity (as determined by siteselection) and in vivo functionality for engineered zinc fingerproteins. TABLE 1 SEQ SEQ SEQ SEQ Kd SBS# TARGET ID F1 ID F2 ID F3 ID(nM) 249 GCGGGGGCG 17 RSDELTR 123 RSDHLSR 229 RSDELRR 335 20 250GCGGGGGCG 18 RSDELTR 124 RSDHLSR 230 RSDTLKK 336 70 251 GCGGAGGCG 19RSDELTR 125 RSDNLTR 231 RSDELRR 337 27.5 252 GCGGCCGCG 20 RSDELTR 126DRSSLTR 232 RSDELRR 338 100 253 GGATGGGGG 21 RSDHLAR 127 RSDHLTT 233QRAHLAR 339 0.75 256 GCGGGGTCC 22 ERGDLTT 128 RSDHLSR 234 RSDELRR 340800 258 GCGGGCGGG 23 RSDHLTR 129 ERGHLTR 235 RSDELRR 341 15 259GCAGAGGAG 24 RSDNLAR 130 RSDNLAR 236 QSGSLTR 342 250 261 GAGGTGGCC 25ERGTLAR 131 RSDALSR 237 RSDNLSR 343 0.5 262 GCGGGGGCT 26 QSSDLQR 132RSDHLSR 238 RSDELRR 344 20 263 GCGGGGGCT 27 QSSDLQR 133 RSDHLSR 239RSDTLKK 345 1 264 GTGGCTGCC 28 DRSSLTR 134 QSSDLQR 240 RSDALAR 346 27265 GTGGCTGCC 29 ERGTLAR 135 QSSDLQR 241 RSDALAR 347 600 269 GGGGCCGGG30 RSDHLTR 136 DRSSLTR 242 RSDHLTR 348 5 270 GGGGCCGGG 31 RSDHLTR 137ERGTLAR 243 RSDHLTR 349 52.5 272 GCAGGGGCC 32 DRSSLTR 138 RSDHLSR 244QSGSLTR 350 20 337 TGCGGGGCAA 33 RSADLTR 139 RSDHLTR 245 ERQHLAT 351 24338 TGCGGGGCAA 34 RSADLTR 140 RSDHLTR 246 ERDHLRT 352 8 339 TGCGGGGCAA35 RSADLTR 141 RSDHLTT 247 ERQHLAT 353 64 340 TGCGGGGCAA 36 RSADLTR 142RSDHLTT 248 ERDHLRT 354 48 341 TGCGGGGCAA 37 RSADLTR 143 RGDHLKD 249ERQHLAT 355 1000 342 TGCGGGGCAA 38 RSADLTR 144 RGDHLKD 250 ERDHLRT 3561000 343 TGCGGGGCAA 39 QSGSLTR 145 RSDHLTR 251 ERQHLAT 357 8 344TGCGGGGCAA 40 QSGSLTR 146 RSDHLTR 252 ERDHLRT 358 6 345 TGCGGGGCAA 41QSGSLTR 147 RSDHLTT 253 ERQHLAT 359 96 346 TGCGGGGCAA 42 QSGSLTR 148RSDHLTT 254 ERDHLRT 360 64 347 TGCGGGGCAA 43 QSGSLTR 149 RGDHLKD 255ERQHLAT 361 1000 348 TGCGGGGCAA 44 QSGSLTR 150 RGDHLKD 256 ERDHLRT 3621000 367 GGGGGCGGG 45 RSDHLTR 151 DSGHLTR 257 RSDHLQR 363 60 368GAGGGGGCG 46 RSDELTR 152 RSDHLTR 258 RSDNLTR 364 3.5 369 GTAGTTGTG 47RSDALTR 153 TGGSLAR 259 QSGSLTR 365 95 370 GTAGTTGTG 48 RSDALTR 154NRATLAR 260 QSASLTR 366 300 371 GTAGTTGTG 49 RSDALTR 155 NRATLAR 261QSGSLTR 367 175 372 GTAGTTGTG 50 RSDSLLR 156 TGGSLAR 262 QSASLTR 368112.5 373 GTAGTTGTG 51 RSDSLLR 157 NRATLAR 263 QSASLTR 369 320 374GCTGAGGAA 52 QRSNLVR 158 RSDNLTR 264 TSSELQR 370 3.3 375 GAGGAAGAT 53QQSNLAR 159 QSGNLQR 265 RSDNLTR 371 85 401 GTAGTTGTG 54 RSDALTR 160TGGSLAR 266 QSASLTR 372 80 403 GTAGTTGTG 55 RSDSLLR 161 NRATLAR 267QSGSLTR 373 750 421 GTAGTTGTG 56 DSDSLLR 162 TGGSLAR 268 QSGSLTR 374 500422 GTAGTTGTG 57 RSDSLLR 163 TGGSLTR 269 QSGSLTR 375 200 423 GTAGTTGTG58 RSDALTR 164 TGGSLAR 270 QRSALAR 376 1000 424 GATGCTGAG 59 RSDNLTR 165TSSELQR 271 TSANLSR 377 100 425 GATGCTGAG 60 RSDNLTR 166 QSSDLQR 272QQSNLAR 378 25 426 GATGCTGAG 61 RSDNLTR 167 QSSDLQR 273 TSANLSR 379 5.5427 GCTGAGGAA 62 QRSNLVR 168 RSDNLTR 274 QSSDLQR 380 1 428 GAAGATGAC 63DSSNLTR 169 QQSNLAR 275 QRSNLVR 381 120 429 GAAGATGAC 64 DSSNLTR 170TSANLSR 276 QRSNLVR 382 50 430 GATGACGAC 65 EKANLTR 171 DSSNLTR 277QQSNLAR 383 250 431 GACGACGGC 66 DSGHLTR 172 DRSNLER 278 DSSNLTR 384 100432 GACGACGGC 67 DSGHLTR 173 DHANLAR 279 DSSNLTR 385 1000 433 GACGACGGC68 DSGNLTR 174 DHANLAR 280 DSSNLTR 386 1000 434 GACGGCGTA 69 QSASLTR 175DSGHLTR 281 EKANLTR 387 152.5 435 GACGGCGTA 70 QSASLTR 176 DSGHLTR 282ERGNLTR 388 150 436 GACGGCGTA 71 QRSALAR 177 DSGHLTR 283 EKANLTR 389 95437 GACGGCGTA 72 QRSALAR 178 DSGHLTR 284 ERGNLTR 390 117.5 438 GAGGGGGCG73 RSDELTR 179 RSDHLTT 285 RSDNLTR 391 62.5 440 GCCGAGGTGC 74 RSDSLLR180 RSKNLQR 286 ERGTLAR 392 40 441 GGTGGAGTCA 75 DSGSLTR 181 QSGHLQR 287TSGHLTR 393 250 445 GTCGCAGTGA 76 RSDSLRR 182 QSSDLQK 288 DSGSLTR 3941000 450 GACTTGGTGC 77 RSDTLAR 183 RGDALTS 289 DRSNLTR 395 130 453GGTGGAGTCA 78 DRSALAR 184 QSGHLQR 290 DSSKLSR 396 150 461 GAGTACTGTA 79QRSHLTT 185 DRSNLRT 291 RSDNLAR 397 120 463 GTGGAGGAGA 80 RSDNLTR 186RSDNLAR 292 RSDALAR 398 0.5 464 GTGGAGGAGA 81 RSDNLTR 187 RSDNLAR 293RSDSLAR 399 0.4 466 CAGGCTGCGC 82 RSDDLTR 188 QSSDLQR 294 RSDNLRE 400 65467 CAGGCTGCGC 83 RSDELTR 189 QSSDLQR 295 RGDHLKD 401 800 468 CAGGCTGCGC84 RSDDLTR 190 QSSDLQR 296 RGDHLKD 402 42 469 GAAGAGGTCT 85 DRSALAR 191RSDNLAR 297 QSGNLTR 403 13.5 472 GAGGTCTGGA 86 RSSHLTT 192 DRSALAR 298RSDNLAR 404 80 476 GGAGAGGATG 87 TTSNLRR 193 RSDNLAR 299 QSDHLTR 405 80477 GGAGAGGATG 88 TTSNLRR 194 RSDNLAR 300 QRAHLAR 406 100 478 GGAGAGGATG89 TTSNLRR 195 RSDNLAR 301 QSGHLRR 407 60 479 GTGGCGGACC 90 DSSNLTR 196RSDELQR 302 RSDALAR 408 8.5 480 GTGGCGGACC 91 DSSNLTR 197 RADTLRR 303RSDALAR 409 5 483 GAGGGCGAAG 92 QSANLAR 198 ESSKLKR 304 RSDNLAR 410 130484 GAGGGCGAAG 93 QSDNLAR 199 ESSKLKR 305 RSDNLAR 411 1000 485GGAGAGGTTT 94 QSSALAR 200 RSDNLAR 306 QRAHLAR 412 110 487 GGAGAGGTTT 95NRATLAR 201 RSDNLAR 307 QSGHLAR 413 76.9 488 TGGTAGGGGG 96 RSDHLAR 202RSDNLTT 308 RSDHLTT 414 35 490 TAGGGGGTGG 97 RSDSLLR 203 RSDHLTR 309RSDNLTT 415 1.5 503 GCCGAGGTGC 98 RSDSLLR 204 RSDNLAR 310 ERGTLAR 416 50504 GCCGAGGTGC 99 RSDSLLR 205 RSDNLAR 311 DRSDLTR 417 25 505 GCCGAGGTGC100 RSDSLLR 206 RSDNLAR 312 DCRDLAR 418 65 526 GCGGGCGGGC 101 RSDHLTR207 ERGHLTR 313 RSDTLKK 419 8 543 GAGTGTGTGA 102 RSDLLQR 208 MSHHLKE 314RSDHLSR 420 50 544 GAGTGTGTGA 103 RSDSLLR 209 MSHHLKE 315 RSDNLAR 421125 545 GAGTGTGTGA 104 RKDSLVR 210 TSDHLAS 316 RSDNLTR 422 32 546GAGTGTGTGA 105 RSDLLQR 211 MSHHLKT 317 RLDGLRT 423 500 547 GAGTGTGTGA106 RKDSLVR 212 TSGHLTS 318 RSDNLTR 424 500 548 GAGTGTGTGA 107 RSSLLQR213 MSHHLKT 319 RSDHLSR 425 500 549 GAGTGTGTGA 108 RSSLLQR 214 MSHHLKE320 RSDHLSR 426 500 550 GAGTGTGTGA 109 RKDSLVR 215 TKDHLAS 321 RSDNKLTR427 20 551 GAGTGTGTGA 110 RSDLLQR 216 MSHHLKT 322 RSDHLSR 428 50 552GAGTGTGTGA 111 RKDSLVR 217 MSHHLKT 323 RSDNLTR 429 31 553 GAGTGTGTGA 112RSDSLLR 218 MSHHLKE 324 RSDNLTR 430 125 554 GAGTGTGTGA 113 RKDSLVR 219TSDHLAS 325 RSDNLAR 431 62.5 558 TGCGGGGCA 114 QSGDLTR 220 RSDHLTR 326DSGHLAS 432 21 559 GAGTGTGTGA 115 RSDSLLR 221 TSDHLAS 327 RSDNLAR 4331000 560 GAGTGTGTGA 116 RSSLLQR 222 MSHHLKT 328 RSDHLSR 434 500 561GAGTGTGTGA 117 RKDSLVR 223 MSHHLKE 329 RSDNLAR 435 1000 562 GAGTGTGTGA118 RSDSLLR 224 TSGHLTS 330 RSDNLAR 436 1000 565 GATGCTGAG 119 RSDNLTR225 TSSELQR 331 QQSNLAR 437 100 567 GAAGATGAC 120 EKANLTR 226 TSANLSR332 QRSNLVR 438 47.5 568 GATGACGAC 121 EKANLTR 227 DSSNLTR 333 TSANLSR439 300 569 GTAGTTGTG 122 RSDSLLR 228 TGGSLAR 334 QRSALTR 440 52

[0116] TABLE 2 SEQ SEQ SEQ SEQ Kd SBS# TARGET ID F1 ID F2 ID F3 ID (nM)201 GCAGCCTTG 441 RSDSLTS 646 ERSTLTR 851 QRADLRR 1056 1000 202GCAGCCTTG 442 RSDSLTS 647 ERSTLTR 852 QRADLAR 1057 1000 203 GCAGCCTTG443 RSDSLTS 648 ERSTLTR 853 QRATLRR 1058 1000 204 GCAGCCTTG 444 RSDSLTS649 ERSTLTR 854 QRATLAR 1059 1000 205 GAGGTAGAA 445 QSANLAR 650 QSATLAR855 RSDNLSR 1060 80 206 GAGGTAGAA 446 QSANLAR 651 QSAVLAR 856 RSDNLSR1061 1000 207 GAGTGGTTA 447 QRASLAS 652 RSDHLTT 857 RSDNLAR 1062 70 208TAGGTCTTA 448 QRASLAS 653 DRSALAR 858 RSDNLAS 1063 1000 209 GGAGTGGTT449 QSSALAR 654 RSDALAR 859 QRAHLAR 1064 35 210 GGAGTGGTT 450 NRDTLAR655 RSDALAR 860 QRAHLAR 1065 65 211 GGAGTGGTT 451 QSSALAR 656 RSDALAS861 QRAHLAR 1066 140 212 GGAGTGGTT 452 NRDTLAR 657 RSDALAS 862 QRAHLAR1067 400 213 GTTGCTGGA 453 QRAHLAR 658 QSSTLAR 863 QSSALAR 1068 1000 214GTTGCTGGA 454 QRAHLAR 659 QSSTLAR 864 NRDTLAR 1069 1000 215 GAAGTCTGT455 NRDHLMV 660 DRSALAR 865 QSANLSR 1070 1000 216 GAAGTCTGT 456 NRDHLTT661 DRSALAR 866 QSANLSR 1071 1000 217 GAGGTCGTA 457 QRSALAR 662 DRSALAR867 RSDNLAR 1072 40 219 GATGTTGAT 458 QQSNLAR 663 NRDTLAR 868 NRDNLSR1073 1000 220 GATGTTGAT 459 QQSNLAR 664 NRDTLAR 869 QQSNLSR 1074 1000221 GATGAGTAC 460 DRSNLRT 665 RSDNLAR 870 NRDNLAR 1075 1000 222GATGAGTAC 461 ERSNLRT 666 RSDNLAR 871 NRDNLAR 1076 1000 223 GATGAGTAC462 DRSNLRT 667 RSDNLAR 872 QQSNLAR 1077 105 224 GATGAGTAC 463 ERSNLRT668 RSDNLAR 873 QQSNLAR 1078 1000 225 TGGGAGGTC 464 DRSALAR 669 RSDNLAR874 RSDHLTT 1079 6 226 GCAGCCTTG 465 RGDALTS 670 ERGTLAR 875 QSGSLTR1080 1000 227 GCAGCCTTG 466 RGDALTV 671 ERGTLAR 876 QSGSLTR 1081 1000228 GCAGCCTTG 467 RGDALTM 672 ERGTLAR 877 QSGSLTR 1082 1000 229GCAGCCTTG 468 RGDALTS 673 ERGTLAR 878 RSDELTR 1083 1000 230 GCAGCCTTG469 RGDALTV 674 ERGTLAR 879 RSDELTR 1084 1000 231 GCAGCCTTG 470 RGDALTM675 ERGTLAR 880 RSDELTR 1085 1000 232 GGTGTGGTG 471 RSDALTR 676 RSDALAR881 NRSHLAR 1086 50 233 GGTGTGGTG 472 RSDALTR 677 RSDALAR 882 QASHLAR1087 100 235 GTAGAGGTG 473 RSDALTR 678 RSDNLAR 883 QRGALAR 1088 80 236GGGGAGGGG 474 RSDHLAR 679 RSDNLAR 884 RSDHLSR 1089 0.3 237 GGGGAGGCC 475ERGTLAR 680 RSDNLAR 885 RSDHLSR 1090 0.3 238 GGGGAGGCC 476 ERGTLAR 681RSDNLQR 886 RSDHLSR 1091 0.8 239 GGCGGGGAG 477 RSDNLTR 682 RSDHLTR 887DRSHLAR 1092 0.4 240 GCAGGGGAG 478 RSDNLTR 683 RSDHLSR 888 QSGSLTR 10931 242 GGGGGTGCT 479 QSSDLRR 684 QSSHLAR 889 RSDHLSR 1094 1 243 GTGGGCGCT480 QSSDLRR 685 DRSHLAR 890 RSDALAR 1095 75 244 TAAGAAGGG 481 RSDHLAR686 QSGNLTR 891 QSGNLRT 1096 100 245 TAAGAAGGG 482 RSDHLAR 687 QSANLTR892 QSGNLRT 1097 235 246 GAAGGGGAG 483 RSDNLAR 688 RSDHLAR 893 QSGNLTR1098 2 247 GAAGGGGAG 484 RSDNLAR 689 RSDHLAR 894 QSGNLRR 1099 2 276GCGGCCGCG 485 RSDELTR 690 ERGTLAR 895 RSDERKR 1100 90 277 GCGGCCGCG 486RSDELTR 691 DRSSLTR 896 RSDERKR 1101 107 278 GCGGCCGCG 487 QSWELTR 692ERGTLAR 897 RSDERKR 1102 190 279 GCGGCCGCG 488 QSWELTR 693 DRSSLTR 898RSDERKR 1103 260 280 GCGGCCGCG 489 QSGSLTR 694 ERGTLAR 899 RSDERKR 1104160 281 GCGGCCGCG 490 QSGSLTR 695 DRSSLTR 900 RSDERKR 1105 225 282GCAGAAGTG 491 RGDALTR 696 QSANLTR 901 QSADLAR 1106 1000 283 GCAGAAGTG492 RSDALTR 697 QSGNLTR 902 QSGSLTR 1107 2 284 GCGGCCGCG 493 QSGSLTR 698RSDHLTT 903 RSDERKR 1108 1000 285 TGTGCGGCC 494 ERGTLRG 699 RSDELTR 904SRDHLQS 1109 1000 287 GCAGAAGCG 495 RGPDLAR 700 QSANLTR 905 QSGSLTR 11101000 288 GCAGAAGCG 496 RGPDLAR 701 QSANLTR 906 QSGSLTR 1111 1000 289GCAGAAGCG 497 RGPDLAR 702 QSGNLQR 907 QSGSLTR 1112 800 290 GCAGAAGCG 498RSDELAR 703 QSANLTR 908 QSADLAR 1113 1000 292 GCAGAAGCG 499 RSDELTR 704QSANLQR 909 QSGSLTR 1114 1000 293 GTGTGCGGC 500 DRSHLTR 705 ERHSLQT 910RSDALTR 1115 320 296 TGCGCGGCC 501 ERGTLAR 706 RSDELTR 911 DRDHLQS 11161000 297 TGCGCGGCC 502 ERGTLAR 707 RSDELRR 912 DRSHLQT 1117 500 298GCTTAGGCA 503 QTGELRR 708 RSDNLQK 913 TSGDLSR 1118 4000 299 GCTTAGGCA504 QTSDLRR 709 RSDNLQK 914 QSSDLQR 1119 4000 300 GCTTAGGCA 505 QTADLRR710 RSDNLQR 915 QSSDLSR 1120 400 301 GCTTAGGCA 506 QSADLRR 711 RSDNLQT916 QSSDLSR 1121 350 302 GCTTAGGCA 507 QSGSLTR 712 RSDNLQT 917 QSSDLSR1122 75 303 GCTTAGGCA 508 QTGSLTR 713 RSDNLQT 918 QSSDLSR 1123 135 304GCTTAGGCA 509 QTADLTR 714 RSDNLQT 919 QSSDLSR 1124 230 305 GCTTAGGCA 510QTGDLTR 715 RSDNLQT 920 QSSDLSR 1125 230 306 GCTTAGGCA 511 QTASLTR 716RSDNLQT 921 QSSDLSR 1126 280 307 GAAGAAGCG 512 RSDELRR 717 QSGNLQR 922QSGNLSR 1127 50.5 308 GAAGAAGCG 513 RSDELRR 718 QSANLQR 923 QSANLQR 11281000 309 GGAGATGCC 514 ERSDLRR 719 QSSNLQR 924 QSGHLSR 1129 4000 310GGAGATGCC 515 DRSDTTR 720 NRDNLQT 925 QSGHLSR 1130 1000 311 GGAGATGCC516 DRSTLTR 721 NRDNLQR 926 QSGHLSR 1131 170 312 GGAGATGCC 517 ERGTLAR722 NRDNLQR 927 QSGHLSR 1132 2000 313 GGAGATGCC 518 DRSDLTR 723 QRSNLQR928 QSGHLSR 1133 1000 314 GGAGATGCC 519 DRSSLTR 724 QSSNLQR 929 QSGHLSR1134 117.5 315 GGAGATGCC 520 ERGTLAR 725 QSSNLQR 930 QSGHLSR 1135 265316 GGAGATGCC 521 ERGTLAR 726 QRDNLQR 931 QSGHLSR 1136 3000 318TAGGAGATGC 522 RSDALTS 727 RSDNLAR 932 RSDNLAS 1137 100 319 GGGGAAGGG523 KTSHLRA 728 QSGNLSR 933 RSDHLSR 1138 125 320 GGGGAAGGG 524 RSDHLTR729 QSGNLSR 934 RSDHLSR 1139 5 321 GGCGGAGAT 525 TTSNLRR 730 QSGHLQR 93SDRSHLTR 1140 200 323 GGCGGAGAT 526 TTSNLRR 731 QSGHLQR 936 DRDHLTR 1141600 324 GGCGGAGAT 527 TTSNLRR 732 QSGHLQR 937 DRDHLTR 1142 200 325GTATCTGCT 528 NSSDLTR 733 NSDVLTS 938 QSDVLTR 1143 1000 326 GTATCTGTT529 NSDALTR 734 NSDVLTS 939 QSDVLTR 1144 1000 327 TCTGCTGGG 530 RSDHLTR735 NSADLTR 940 NSDDLTR 1145 1000 328 TCTGTTGGG 531 RSDHLTR 736 NSSALTS941 NSDDLTR 1146 1000 349 GGTGTCGCC 532 DCRDLAR 737 DSGSLTR 942 TSGHLTR1147 1000 350 TCCGAGGGT 533 TSGHLTR 738 RSDNLTR 943 DCRDLTT 1148 332 351GCTGGTGTC 534 DSGSLTR 739 TSGHLTR 944 TLHTLTR 1149 1000 352 GGAGGGGTG535 RSDSLLR 740 RSDHLTR 945 QSDHLTR 1150 26 353 GTTGGAGCC 536 DCRDLAR741 QSDHLTR 946 TSGALTR 1151 1000 354 GAAGAGGAC 537 DSSNLTR 742 RSDNLTR947 QRSNLVR 1152 28 355 GAAGAGGAC 538 EKANLTR 743 RSDNLTR 948 QRSNLVR1153 20 356 GGCTGGGCG 539 RSDELRR 744 RSDHLTK 949 DSDHLSR 1154 1000 357GGCTGGGCG 540 RSDELRR 745 RSDHLTK 950 DSDHLSR 1155 1000 358 GGCTGGGCG541 RSDELRR 746 RSDHLTK 951 DSSHLSR 1156 225 361 GGGTTTGGG 542 RSDHLTR747 QSSALTR 952 RSDHLTR 1157 130 363 GGGTTTGGG 543 RSDHLTR 748 QSSVLTR953 RSDHLTR 1158 200 364 GTGTCCGAAG 544 RSDNLTR 749 DSAVLTT 954 RSDSLTR1159 1000 365 GGTGCTGGT 545 QASHLTR 750 QASVLTR 955 QASHLTR 1160 600 366GAGGGTGCT 546 QASVLTR 751 QASHLTR 956 RSDNLTR 1161 1000 367 GGGGGCGGG547 RSDHLTR 752 DSGHLTR 957 RSDHLQR 1162 60 368 GAGGGGGCG 548 RSDELTR753 RSDHLTR 958 RSDNLTR 1163 3.5 369 GTAGTTGTG 549 RSDALTR 754 TGGSLAR959 QSGSLTR 1164 95 370 GTAGTTGTG 550 RSDALTR 755 NRATLAR 960 QSASLTR1165 300 371 GTAGTTGTG 551 RSDALTR 756 NRATLAR 961 QSGSLTR 1166 175 372GTAGTTGTG 552 RSDSLLR 757 TGGSLAR 962 QSASLTR 1167 112.5 373 GTAGTTGTG553 RSDSLLR 758 NRATLAR 963 QSASLTR 1168 320 374 GCTGAGGAA 554 QRSNLVR759 RSDNLTR 964 TSSELQR 1169 3.3 375 GAGGAAGAT 555 QQSNLAR 760 QSGNLQR965 RSDNLTR 1170 85 377 GTGTTGGCAG 556 QSGSLTR 761 RGDALTS 966 RSDALTR1171 89 378 GCCGAGGAGA 557 RSDNLTR 762 RSDNLTR 967 DRSSLTR 1172 31 379GCCGAGGAGA 558 RSDNLTR 763 RSDNLTR 968 ERGTLAR 1173 3 380 GAGTCGCAAG 559QSANLAR 764 RSDELTT 969 RSDNLAR 1174 1000 381 GCAGCTGCGC 560 RSDELTR 765QSSDLQR 970 QSGDLTR 1175 1.5 383 TGGTTGGTAT 561 QSATLAR 766 RGDALTS 971RSDHLTT 1176 1000 384 GTGGGCTTCA 562 DRSALTT 767 DRSHLAR 972 RSDALAR1177 60 385 GGGGCGGAGC 563 RSDNLTR 768 RSDTLKK 973 RSDHLSR 1178 1.2 386GGGGCGGAGC 564 RSDNLTR 769 RSDELQR 974 RSDHLSR 1179 0.4 387 GGCGAGGCAA565 QSGSLTR 770 RSDNLAR 975 DRSHLAR 1180 2.5 388 GGCGAGGCAA 566 QSGDLTR771 RSDNLAR 976 DRSHLAR 1181 28 390 GTGGCAGCGG 567 RSDTLKK 772 QSSDLQK977 RSDALAR 1182 20 392 GTGGCAGCGG 568 RSDELTR 773 QSSDLQK 978 RSDALAR1183 1000 396 GCGGGAGCAG 569 QSGSLTR 774 QSGHLQR 979 RSDTLKK 1184 18.8397 GCGGGAGCAG 570 QSGDLTR 775 QSGHLQR 980 RSDTLKK 118 525 400TCAGTGGTGG 571 RSDALAR 776 RSDSLAR 981 QSGDLRT 1186 40 405 GCGGCCGCA 572RSDELTR 777 ERGTLAR 982 RSDERKR 1187 110 406 GCGGCCGCA 573 RSDELTR 778DRSSLTR 983 RSDERKR 1188 110 407 GCGGCCGCA 574 QSWELTR 779 ERGTLAR 984RSDERKR 1189 410 408 GCGGCCGCA 575 QSWELTR 780 DRSSLTR 985 RSDERKR 1190380 409 GCGGCCGCA 576 QSGSLTR 781 ERGTLAR 986 RSDERKR 1191 50 410GCAGAAGTC 577 RSDALTR 782 QSGNLTR 987 QSGSLTR 1192 3 411 GCGGCCGCA 578QSGSLTR 783 RSDHLTT 988 RSDERKR 1193 1000 412 GCGTGGGCG 579 QSGSLTR 784RSDHLTT 989 RSDERKR 1194 5 413 GCGTGGGCA 580 QSGSLTR 785 RSDHLTT 990RSDERKR 1195 5 414 GCAGAAGCA 581 RSDELTR 786 QSANLQR 991 QSGSLTR 11961000 415 GTGTGCGGA 582 DRSHLTR 787 ERHSLQT 992 RSDALTR 1197 1000 416TGTGCGGCC 583 ERGTLAR 788 RSDELRR 993 DRSHLQT 1198 1000 493 GGGGTGGCGG584 RSDTLKK 789 RSDSLAR 994 RSDHLSR 1199 300 494 GCCGAGGAGA 585 RSDNLTR790 RSDNLTR 995 DRSSLTR 1200 90 496 GGTGGTGGC 586 DTSHLRR 791 TSGHLQR996 TSGHLSR 1201 1000 497 GTTTGCGTC 587 ETASLRR 792 DSAHLQR 997 TSSALSR1202 1000 498 GAAGAGGCA 588 QTGELRR 793 RSDNLQR 998 QSGNLSR 1203 30 499GCTTGTGAG 589 RTSNLRR 794 TSSHLQK 999 DTDHLRR 1204 1000 500 GCTTGTGAG590 RSDNLTR 795 QSSNLQT 1000 DRSHLAR 1205 1000 501 GTGGGGGTT 591 NRATLAR796 RSDHLSR 1001 RSDALAR 1206 8 502 GGGGTGGGA 592 QSAHLAR 797 RSDALAR1002 RSDHLSR 1207 60 507 GAGGTAGAGG 593 RSDNLAR 798 QRSALAR 1003 RSDNLAR1208 10 508 GAGGTAGAGG 594 RSDNLAR 799 QSATLR 1004 RSDNLAR 1209 10 509GTCGTGTGGC 595 RSDHLTT 800 RSDALAR 1005 DRSALAR 1210 100 510 GTTGAGGAAG596 QSGNLAR 801 RSDNLAR 1006 NRATLAR 1211 100 511 GTTGAGGAAG 597 QSGNLAR802 RSDNLAR 1007 QSSALAR 1212 100 512 GAGGTGGAAG 598 QSGNLAR 803 RSDALAR1008 RSDNLAR 1213 10 513 GAGGTGGAAG 599 QSANLAR 804 RSDALAR 1009 RSDNLAR1214 1.5 514 TAGGTGGTGG 600 RSDALTR 805 RSDALAR 1010 RSDNLTT 1215 10 515TGGGAGGAGT 601 RSDNLTR 806 RSDNLTR 1011 RSDHLTT 1216 0.5 516 GGAGGAGCT602 TTSELRR 807 QSGHLQR 1012 QSGHLSR 1217 700 517 GGAGCTGGGG 603 RTDHLRR808 TSSELQR 1013 QSGHLSR 1218 50 518 GGGGGAGGAG 604 QTGHLRR 809 QSGHLQR1014 RSDHLSR 1219 30 519 GGGGAGGAGA 605 RSDNLAR 810 RSDNLSR 1015 RSDHLSR1220 0.3 520 GGAGGAGAT 606 TTANLRR 811 QSGHLQR 1016 QSGHLSR 1221 300 521GCAGCAGGA 607 QTGHLRR 812 QSGELQR 1017 QSGELSR 1222 1000 522 GATGAGGCA608 QTGELRR 813 RSDNLQR 1018 TSANLSR 1223 200 527 GGGGAGGATC 609 TTSNLRR814 RSSNLQR 1019 RSDHLSR 1224 2 528 GGGGAGGATC 610 TTSNLRR 815 RSSNLQR1020 RSDHLSR 1225 10 529 GAGGCTTGGG 611 RTDHLRK 816 TSAELQR 1021 RSSNLSR1226 1000 531 GCGGAGGCTT 612 TTGELRR 817 RSSNLQR 1022 RSDELSR 1227 160532 GCGGAGGCTT 613 QSSDLQR 818 RSSNLQR 1023 RSDELSR 1228 100 533GCGGAGGCTT 614 QSSDLQR 819 RSDNLAR 1024 RSADLSR 1229 7 534 GCGGAGGCTT615 QSSDLQR 820 RSDNLAR 1025 RSDDLRR 1230 10 535 GCAGCCGGG 616 RTDHLRR821 ESSDLQR 1026 QSGELSR 1231 1000 538 GCAGAGGCTT 617 QSSDLQR 822RSDNLAR 1027 QSGSLTR 1232 70 540 TGGGCAGGCC 618 DRSHLTR 823 QSGSLTR 1028RSDHLTT 1233 55 541 GGGGAGGAT 619 TTSNLRR 824 RSSNLQR 1029 RSDHLSR 12343 570 GGGGAAGGCT 620 DSGHLTR 825 QRSNLVR 1030 RSDHLTR 1235 20 571GTGTGTGTGT 621 RSDSLTR 826 QRSNLVR 1031 RSDSLLR 1236 1000 572 GCATACGTGG622 RSDSLLR 827 DKGNLQS 1032 QSDDLTR 1237 1000 573 GCATACGTG 623 RSDSLLR828 DKGNLQS 1033 QSGDLTR 1238 1000 574 TACGTGGGGT 624 RSDHLTR 829RSDHLTR 1034 DKGNLQT 1239 25 575 TACGTGGGCT 625 DFSHLTR 830 RSDHLTR 1035DKGNLQT 1240 472 576 GAGGGTGTTG 626 NSDTLAR 831 TSGHLTR 1036 RSDNLTR1241 200 577 GGAGCGGGGA 627 RSDHLSR 832 RSDELQR 1037 QSDHLTR 1242 200579 GGGGTTGAGG 628 RSDNLTR 833 NRDTLAR 1038 TSGHLTR 1243 200 580GGTGTTGGAG 629 QRAHLAR 834 NRDTLAR 1039 TSGHLTR 1244 1000 581 TACGTGGGTT630 QSSHLTR 835 RSDSLLR 1040 DKGNLQT 1245 382 583 GTAGGGGTTG 631 NSSALTR836 RSDHLTR 1041 QSASLTR 1246 46 584 GAAGGCGGAG 632 QAGHLTR 837 DKSHLTR1042 QSGNLTR 1247 1000 585 GAAGGCGGAG 633 QAGHLTR 838 DSGHLTR 1043QSGNLTR 1248 1000 587 GGGGGTTACG 634 DKGNLQT 839 TSGHLTR 1044 RSDHLSK1249 500 588 GGGGGGGGGG 635 RSDHLSR 840 RSDHLTR 1045 RSDHLSK 1250 30 589GGAGTATGCT 636 DSGHLAS 841 QSATLAR 1046 QSDHLTR 1251 1000 595 TGGTTGGTAT637 QRGSLAR 842 RGDALTR 1047 RSDHLTT 1252 73.3 597 TGGTTGGTA 638 QNSAMRK843 RGDALTS 1048 RSDHLTT 1253 1000 598 TGGTTGGTA 639 QRGSLAR 844 RDGSLTS1049 RSDHLTT 1254 1000 599 TGGTTGGTA 640 QNSAMRK 845 RDGSLTS 1050RSDHLTT 1255 1000 600 GAGTCGGAA 641 QSANLAR 846 RSDELRT 1051 RSDNLAR1256 206.7 601 GAGTCGGAA 642 RSANLTR 847 RLDGLRT 1052 RSDNLAR 1257 606.7602 GAGTCGGAA 643 RSANLTR 848 RQDTLVG 1053 RSDNLAR 1258 616.7 603GAGTCGGAA 644 QSGNLAR 849 RSDELRT 1054 RSDNLAR 1259 166.7 606 GGGGAGGATC645 TTSNLRR 850 RSDNLQR 1055 RSDHLSR 1260 0.2

[0117] TABLE 3 SEQ SEQ SEQ SEQ Kd SBS# TARGET ID F1 ID F2 ID F3 ID (nM)897 GAGGAGGTGA 1261 RSDALAR 1347 RSDNLAR 1433 RSDNLVR 1519 0.07 828GCGGAGGACC 1262 EKANLTR 1348 RSDNLAR 1434 RSDERKR 1520 0.1 884GAGGAGGTGA 1263 RSDSLTR 1349 RSDNLAR 1435 RSDNLVR 1521 0.15 817GAGGAGGTGA 1264 RSDSLTR 1350 RSDNLAR 1436 RSDNLAR 1522 0.31 666GCGGAGGCGC 1265 RSDDLTR 1351 RSDNLTR 1437 RSDTLKK 1523 0.5 829GCGGAGGACC 1266 EKANLTR 1352 RSDNLAR 1438 RSDTLKK 1524 0.52 670GACGTGGAGG 1267 RSDNLAR 1353 RSDALAR 1439 DRSNLTR 1525 0.57 801AAGGAGTCGC 1268 RSADLRT 1354 RSDNLAR 1440 RSDNLTQ 1526 0.85 668GTGGAGGCCA 1269 ERGTLAR 13S5 RSDNLAR 1441 RSDALAR 1527 1.13 895ATGGATTCAG 1270 QSHDLTK 1356 TSGNLVR 1442 RSDALTQ 1528 1.4 799GGGGGAGCTG 1271 QSSDLQR 1357 QRAHLER 1443 RSDHLSR 1529 1.85 798GGGGGAGCTG 1272 QSSDLQR 1358 QSGHLER 1444 RSDHLSR 1530 3 842 GAGGTGGGCT1273 DRSHLTR 1359 RSDALAR 1445 RSDNLAR 1531 5.4 894 TCAGTGGTAT 1274QRSALAR 1360 RSDALSR 1446 QSHDLTK 1532 6.15 892 ATGGATTCAG 1275 QSHDLTK1361 QQSNLVR 1447 RSDALTQ 1533 6.2 888 TCAGTGGTAT 1276 QSSSLVR 1362RSDALSR 1448 QSHDLTK 1534 14 739 GCGGGCGGGC 1277 RSDHLTR 1363 ERGHLTR1449 RSDDLRR 1535 16.5 850 CAGGCTGTGG 1278 RSDALTR 1364 QSSDLTR 1450RSDNLRE 1536 17 797 GCAGAGGCTG 1279 QSSDLQR 1365 RSDNLAR 1451 QSGDLTR1537 17.5 891 TCAGTGGTAT 1280 QSSSLVR 1366 RSDALSR 1452 QSGSLRT 153818.5 887 TCAGTGGTAT 1281 QRSALAR 1367 RSDALSR 1453 QSGDLRT 1539 23.75672 TCGGACGTGG 1282 RSDALAR 1368 DRSNLTR 1454 RSDELRT 1540 24 836GGGGAGGCCC 1283 ERGTLAR 1369 RSDNLAR 1455 RSDHLSR 1541 24.25 674GCGGCGTCGG 1284 RSDELRT 1370 RADTLRR 1456 RSDTLKK 1542 27.5 849GGGGCCCTGG 1285 RSDALRE 1371 DRSSLTR 1457 RSDHLTQ 1543 29.05 825GAATGGGCAG 1286 QSGSLTR 1372 RSDHLTT 1458 QSGNLTR 1544 37.3 673GCGGGTGTCT 1287 DRSALAR 1373 QSSHLAR 1459 RSDTLKK 1545 48.33 848GGGGAGGCCC 1288 DRSSLTR 1374 RSDNLAR 1460 RSDHLSR 1546 49.5 662AGAGCGGCAC 1289 QTGSLTR 1375 RSDELQR 1461 QSGHLNQ 1547 50 667 GAGTCGGACG1290 DRSNLTR 1376 RSDELRT 1462 RSDNLAR 1548 50 803 GCAGCGGCTC 1291QSSDLQR 1377 RSDELQR 1463 QSGSLTR 1549 57.5 671 TCGGACGAGT 1292 RSDNLAR1378 DRSNLTR 1464 RSDELRT 1550 64 851 GAGATGGATC 1293 QSSNLQR 1379RRDVLMN 1465 RLHNLQR 1551 74 804 GCAGCGGCTC 1294 QSSDLQR 1380 RSDDLNR1466 QSGSLTR 1552 82.5 669 GACGAGTCGG 1295 RSDELRT 1381 RSDNLAR 1467DRSNLTR 1553 90 682 GCTGCAGGAG 1296 RSDHLAR 1382 QSGDLTR 1468 QSSDLSR1554 90 845 GAGATGGATC 1297 QSSNLQR 1383 RSDALRQ 1469 RLHNLQR 1S5S 112.5663 AGAGCGGCAC 1298 QTGSLTR 1384 RSDELQR 1470 KNWKLQA 1556 115 738GCGGGGTCCG 1299 ERGTLTT 1385 RSDHLSR 1471 RSDDLRR 1557 120 664AGAGCGGCAC 1300 QTGSLTR 1386 RADTLRR 1472 ASSRLAT 1558 125 833GACTAGGACC 1301 EKANLTR 1387 RSDNLTK 1473 DRSNLTR 1559 136 685GCTGCAGGAG 1302 RSDHLAR 1388 QSGSLTR 1474 QSSDLSR 1560 150 835TAGGGAGCGT 1303 RADTLRR 1389 QSGHLTR 1475 RSDNLTT 1561 150 847TAGGGAGCGT 1304 RSDDLTR 1390 QSGHLTR 1476 RSDNLTT 1562 150 818GAATGGGCAG 1305 QSGSLTR 1391 RSDHLTT 1477 QSSNLVR 1563 167 834GACTAGGACC 1306 EKANLTR 1392 RSDHLTT 1478 DRSNLTR 1564 186 837GGGGCCCTGG 1307 RSDALRE 1393 DRSSLTR 1479 RSDHLSR 1565 222 764GCAGAGGCTG 1308 TSGELVR 1394 RSDNLAR 1480 QSGDLTR 1566 255 774GCAGCGGTAG 1309 QRSALAR 1395 RSDELQR 1481 QSGDLTR 1567 258 765GCCGAGGCCG 1310 ERGTLAR 1396 RSDNLAR 1482 ERGTLAR 1568 262.5 766GCCGAGGCCG 1311 ERGTLAR 1397 RSDNLAR 1483 DRSDLTR 1569 262.5 775GCAGCGGTAG 1312 QSGALTR 1398 RSDELQR 1484 QSGDLTR 1570 265 763GCAGAGGCTG 1313 TSGELVR 1399 RSDNLAR 1485 QSGSLTR 1571 275 838GGGGCCCTGG 1314 RSDALRE 1400 DRSSLTR 1486 RSDHLTA 1572 300 841GAGTGTGAGG 1315 RSDNLAR 1401 QSSHLAS 1487 RSDNLAR 1573 300 770TTGGCAGCCT 1316 DRSSLTR 1402 QSGSLTR 1488 RSDSLTK 1574 325 767GGGGGAGCTG 1317 QSSDLAR 1403 QSGHLQR 1489 RSDHLSR 1575 335 800TTGGCAGCCT 1318 ERGTLAR 1404 QSGSLTR 1490 RSDSLTK 1576 400 832GACTAGGACC 1319 EKANLTR 1405 RSDNLTT 1491 DRSNLTR 1577 408 844GAGATGGATC 1320 QSSNLQR 1406 RSDALRQ 1492 RSDNLQR 1578 444 683GCTGCAGGAG 1321 QSGHLAR 1407 QSGSLTR 1493 QSSDLSR 1579 500 805GCAGCGGTAG 1322 QRSALAR 1408 RSDELQR 1494 QSGSLTR 1580 500 839GAGTGTGAGG 1323 RSDNLAR 1409 TSDHLAS 1495 RSDNLAR 1581 625 840GAGTGTGAGG 1324 RSDNLAR 1410 MSHHLKT 1496 RSDNLAR 1582 625 830GGAGAGTCGG 1325 RSDELRT 1411 RSDNLAR 1497 QRAHLAR 1583 683 831GGAGAGTCGG 1326 RSDDLTK 1412 RSDNLAR 1498 QRAHLAR 1584 700 684GCTGCAGGAG 1327 RSAHLAR 1413 QSGSLTR 1499 QSSDLSR 1585 850 846GAGATGGATC 1328 QSSNLQR 1414 RRDVLMN 1500 RSDNLQR 1586 889.5 819AAGTAGGGTG 1329 QSSHLTR 1415 RSDNLTT 1501 RSDNLTQ 1587 1000 820ACGGTAGTTA 1330 QSSALTR 1416 QRSALAR 1502 RSDTLTQ 1588 1000 821ACGGTAGTTA 1331 NRATLAR 1417 QRSALAR 1503 RSDTLTQ 1589 1000 822GTGTGCTGGT 1332 RSDHLTT 1418 ERQHLAT 1504 RSDALAR 1590 1000 823GTGTGCTGGT 1333 RSDHLTK 1419 ERQHLAT 1505 RSDALAR 1591 1000 824GTGTGCTGGT 1334 RSDHLTT 1420 DRSHLRT 1506 RSDALAR 1592 1000 885GTGTGCTGGT 1335 RSDHLTK 1421 DRSHLRT 1507 RSDALAR 1593 1000 886TCAGTGGTAT 1336 QSSSLVR 1422 RSDALSR 1508 QSGDLRT 1594 1000 889ATGGATTCAG 1337 QSGSLTT 1423 QQSNLVR 1509 RSDALTQ 1595 1000 890CTGGTATGTC 1338 QRSHLTT 1424 QRSALAR 1510 RSDALRE 1596 1000 896AAGTAGGGTG 1339 TSGHLVR 1425 RSDNLTT 1511 RSDNLTQ 1597 1000 898ACGGTAGTTA 1340 NRATLAR 1426 QSSSLVR 1512 RSDTLTQ 1598 1000 899CTGGTATGTC 1341 QRSHLTT 1427 QSSSLVR 1513 RSDALRE 1599 1000 900CTGGTATGTC 1342 MSHHLKE 1428 QSSSLVR 1514 RSDALRE 1600 1000 901CTGGTATGTC 1343 MSHHLKE 1429 QRSALAR 1515 RSDALRE 1601 1000 773GCAGCGGTAG 1344 QSQALTR 1430 RSDELQR 1516 QSGSLTR 1602 1250 768GGGGGAGCTG 1345 QSSDLAR 1431 QRAHLER 1517 RSDHLSR 1603 2000 681GCTGCAGGAG 1346 RSAHLAR 1432 QSGDLTR 1518 QSSDLSR 1604 3000

[0118] TABLE 4 SEQ F1 SEQ F2 SEQ F3 SEQ Kd SBS# TARGET ID ID ID ID (nM)607 AAGGTGGCAG 1605 QSGDLTR 1707 RSDSLAR 1809 RLDNRTA 1911 6.5 608TTGGCTGGGC 1606 GSWHLTR 1708 QSSDLQR 1810 RSDSLTK 1912 8 611 GTGGCTGCAG1607 QSGDLTR 1709 QSSDLQR 1811 RSDALAR 1913 11.5 612 GTGGCTGCAG 1608QSGTLTR 1710 QSSDLQR 1812 RSDALAR 1914 0.38 613 TTGGCTGGGC 1609 RSDHLAR1711 QSSDLQR 1813 RGDALTS 1915 1.45 614 TTGGCTGGGC 1610 RSDHLAR 1712QSSDLQR 1814 RSDSLTK 1916 2 616 GAGGAGGATG 1611 QSSNLQR 1713 RSDNLAR1815 RSDNLQR 1917 0.08 617 AAGGGGGGG 1612 RSDHLSR 1714 RSDHLTR 1816RKDNMTA 1918 1 618 AAGGGGGGG 1613 RSDHLSR 1715 RSDHLTR 1817 RKDNMTQ 19190.55 619 AAGGGGGGG 1614 RSDHLSR 1716 RSDHLTR 1818 RKDNMTN 1920 1.34 620AAGGGGGGG 1615 RSDHLSR 1717 RSDHLTR 1819 RLDNRTA 1921 0.54 621 AAGGGGGGG1616 RSDHLSR 1718 RSDHLTR 1820 RLDNRTQ 1922 0.75 624 ACGGATGTCT 1617DRSALAR 1719 TSANLAR 1821 RSDTLRS 1923 7 628 TTGTAGGGGA 1618 RSDHLTR1720 RSDNLTT 1822 RGDALTS 1924 130 629 TTGTAGGGGA 1619 RSSHLTR 1721RSDNLTT 1823 RGDALTS 1925 150 630 CGGGGAGAGT 1620 RSDNLAR 1722 QSGHLQR1824 RSDHLRE 1926 37.5 646 TTGGTGGAAG 1621 QSGNLAR 1723 RSDALAR 1825RGDALTS 1927 35 647 TTGGTGGAAG 1622 QSANLAR 1724 RSDALAR 1826 RGDALTS1928 40 651 GTTGTGGAAT 1623 QSGNLSR 1725 RSDALAR 1827 NRATLAR 1929 67.5652 TAGGAGGCTG 1624 QSSDLQR 1726 RSDNLAR 1828 RSDNLTT 1930 1.5 653TAGGAGGCTG 1625 TTSDLTR 1727 RSDNLAR 1829 RSDNLTT 1931 5.5 654TAGGCATAAA 1626 QSGNLRT 1728 QSGSLTR 1830 RSDNLTT 1932 105 655TAGGCATAAA 1627 QSGNLRT 1729 QSSTLRR 1831 RSDNLTT 1933 1000 656TAGGCATAAA 1628 QSGNLRT 1730 QSGSLTR 1832 RSDNLTS 1934 540 657TAGGCATAAA 1629 QSGNLRT 1731 QSSTLRR 1833 RSDNLTS 1935 300 660GAGGGAGTTC 1630 NRATLAR 1732 QSGHLTR 1834 RSDNLAR 1936 8.25 661GAGGGAGTTC 1631 TTSALTR 1733 QSGHLTR 1835 RSDNLAR 1937 1.73 665GCGGAGGCGC 1632 RSDDVTR 1734 RSDNLTR 1836 RSDDLRR 1938 12.5 689AAGGCGGAGA 1633 RSDNLTR 1735 RSDELQR 1837 RLDNRTA 1939 82.5 692AAGGCGGAGA 1634 RSDNLTR 1736 RSDELQR 1838 RSDNLTQ 1940 51 693 AAGGCGGAGA1635 RSDNLTR 1737 RADTLRR 1839 RLDNRTA 1941 95 694 AAGGCGGAGA 1636RSDNLTR 1738 RADTLRR 1840 RSDNLTQ 1942 28.5 695 GGGGGCGAGC 1637 RSSNLTR1739 DRSHLAR 1841 RSDHLTR 1943 850 697 TGAGCGGCGG 1638 RSDELTR 1740RSDELSR 1842 QSGHLTK 1944 200 698 TGAGCGGCGG 1639 RSDELTR 1741 RSDELSR1843 QSHGLTS 1945 300 699 GCGGCGGCAG 1640 QSGSLTR 1742 RSDDLQR 1844RSDERKR 1946 21.5 700 GCGGCGGCAG 1641 QSGDLTR 1743 RSDDLQR 1845 RSDERKR1947 45 701 GCAGCGGAGC 1642 RSDNLAR 1744 RSDELQR 1846 QSGSLTR 1948 50.5702 GCAGCGGAGC 1643 RSDNLAR 1745 RSDELQR 1847 QSGDLTR 1949 73.5 704AAGGTGGCAG 1644 QSGDLTR 1746 RSDSLAR 1848 RSDNLTQ 1950 5 705 GGGGTGGGGC1645 RSDHLAR 1747 RSDSLAR 1849 RSDHLSR 1951 0.01 706 GGGGTGGGGC 1646RSDHLAR 1748 RSDSLLR 1850 RSDHLSR 1952 0.05 708 GAGTCGGAA 1647 QSANLAR1749 RQDTLVG 1851 RSDNLAR 1953 300 709 GAGTCGGAA 1648 QSANLAR 1750RKDVLVS 1852 RSDNLAR 1954 400 710 GAGTCGGAA 1649 QSGNLAR 1751 RLDGLRT1853 RSDNLAR 1955 400 711 GAGTCGGAA 1650 QSGNLAR 1752 RQDTLVG 1854RSDNLAR 1956 400 712 GGTGAGGAGT 1651 RSDNLAR 1753 RSDNLAR 1855 MSDHLSR1957 9.5 713 GGTGAGGAGT 1652 RSDNLAR 1754 RSDNLAR 1856 MSHHLSR 1958 0.15714 TGGGTCGCGG 1653 RSDELRR 1755 DRSALAR 1857 RSDHLTT 1959 200 715TGGGTCGCGG 1654 RADTLRR 1756 DRSALAR 1858 RSDHLTT 1960 0.46 716TTGGGAGCAC 1655 QSGSLTR 1757 QSGHLQR 1859 RGDALTS 1961 200 717TTGGGAGCAC 1656 QSGSLTR 1758 QSGHLQR 1860 RSDALTK 1962 150 718TTGGGAGCAC 1657 QSGSLTR 1759 QSGHLQR 1861 RSDALTR 1963 107.5 719GGCATGGTGG 1658 RSDALTR 1760 RSDALTS 1862 DRSHLAR 1964 20 720 GAAGAGGATG1659 TTSNLAR 1761 RSDNLAR 1863 QSGNLTR 1965 1.6 722 ATGGGGGTGG 1660RSDALTR 1762 RSDHLTR 1864 RSDALRQ 1966 0.7 724 GGCATGGTGG 1661 RSDALTR1763 RSDALRQ 1865 DRSHLAR 1967 2.5 725 GCTTGAGTTA 1662 QSSALAR 1764QSGHLQK 1866 QSSDLQR 1968 3000 726 GAAGAGGATG 1663 QSSNLAR 1765 RSDNLAR1867 QSGNLTR 1969 1.5 727 GCGGTGGCTC 1664 QSSDLTR 1766 RSDALSR 1868RSDTLKK 1970 0.1 728 GGTGAGGAGT 1665 RSDNLAR 1767 RSDNLAR 1869 DSSKLSR1971 15 729 GGAGGGGAGT 1666 RSDNLAR 1768 RSDHLSR 1870 QSGHLAR 1972 1000730 TGGGTCGCGG 1667 RSDDLTR 1769 DRSALAR 1871 RSDHLTT 1973 1000 731GTGGGGGAGA 1668 RSDNLAR 1770 RSDHLSR 1872 RSDALAR 1974 12 732 GCGGGTGGGG1669 RSDHLAR 1771 QSSHLAR 1873 RSDDLTR 1975 22.5 733 GCGGGTGGGG 1670RSDHLAR 1772 QSSHLAR 1874 RSDTLKK 1976 0.32 734 GGGGCTGGGT 1671 RSDHLAR1773 QSSDLSR 1875 RSDHLSR 1977 0.25 735 GCGGTGGCTC 1672 QSSDLTR 1774RSDALSR 1876 RSDERKR 1978 0.05 736 GAGGTGGGGA 1673 RSDHLAR 1775 RSDALSR1877 RSDNLSR 1979 0.47 737 GGAGGGGAGT 1674 RSDNLAR 1776 RSDHLSR 1878QRGHLSR 1980 1000 740 AAGGTGGCAG 1675 QSGSLTR 1777 RSDALAR 1879 RSDNRTA1981 12.5 741 AAGGCTGAGA 1676 RSDNLTR 1778 QSSDLQR 1880 RSDNLTQ 1982 15742 ACGGGGTTAT 1677 QRGALAS 1779 RSDHLSR 1881 RSDTLKQ 1983 29 743ACGGGGTTAT 1678 QRGALAS 1780 RSDHLSR 1882 RSDTLTQ 1984 10 744 ACGGGGTTAT1679 QRSALAS 1781 RSDHLSR 1883 RSDTLKQ 1985 8.33 745 ACGGGGTTAT 1680QRSALAS 1782 RSDHLSR 1884 RSDTLTQ 1986 12.5 746 CTGGAAGCAT 1681 QSGSLTR1783 QSGNLAR 1885 RSDALRE 1987 2.07 747 CTATTTTGGG 1682 RSDHLTT 1784QSSALRT 1886 QSGALRE 1988 2000 748 TTGGACGGCG 1683 DSGHLTR 1785 DRSNLER1887 RGDALTS 1989 112.3 749 TTGGACGGCG 1684 DRSHLTR 1786 DSSNLTR 1888RGDALTS 1990 11.33 750 GAGGGAGCGA 1685 RSDELTR 1787 QSAHLAR 1889 RSDNLAR1991 52 751 GGTGAGGAGT 1686 RSDNLAR 1788 RSDNLAR 1890 NRSHTAR 1992 7 752GAGGTGGGGA 1687 RSHHLAR 1789 RSDALSR 1891 RSDNLSR 1993 31 757 CGGGCGGCTG1688 QSSDLRR 1790 RSDELQR 1892 RSDHLRE 1994 14.5 758 CGGGCGGCTG 1689QSSDLRR 1791 RADTLRR 1893 RSDHLRE 1995 16.5 759 TTGGACGGCG 1690 DSGHLTR1792 DSSNLTR 1894 RGDALTS 1996 37 760 TTGGACGGCG 1691 DRSHLTR 1793DRSNLER 1895 RGDALTS 1997 148.5 761 GCGGTGGCTC 1692 QSSDLQR 1794 RSDALSR1896 RSDERKR 1998 6 762 GCGGTGGCTC 1693 QSSDLQR 1795 RSDALSR 1897RSDTLKK 1999 18 776 ATGGACGGGT 1694 RSDHLAR 1796 DRSNLER 1898 RSDSLNQ2000 0.4 777 ATGGACGGGT 1695 RSDHLAR 1797 DRSNLTR 1899 RSDALSA 2001 3.4779 CGGGGAGCAG 1696 QSGSLTR 1798 QSGHLTR 1900 RSDHLAE 2002 0.5 780CGGGGAGCAG 1697 QSGSLTR 1799 QSGHLTR 1901 RSDHLRA 2003 0.5 781GGGGAGCAGC 1698 RSSNLRE 1800 RSDNLAR 1902 RSDHLTR 2004 4.25 783TTGGGAGCGG 1699 RSDELTR 1801 QSGHLQR 1903 RGDALTS 2005 2000 785TTGGGAGCGG 1700 RSDTLKK 1802 QSGHLQR 1904 RSDALTS 2006 50 786 TTGGGAGCGG1701 RSDTLKK 1803 QSGHLQR 1905 RGDALRS 2007 2000 787 AGGGAGGATG 1702QSDNLAR 1804 RSDNLAR 1906 RSDHLTQ 2008 4 826 GAGGGAGCGA 1703 RSDELTR1805 QSGHLAR 1907 RSDNLAR 2009 2.75 827 GAGGGAGCGA 1704 RADTLRR 1806QSGHLAR 1908 RSDNLAR 2010 1.2 882 GCGTGGGCGT 1705 RSDELTR 1807 RSDHLTT1909 RSDERKR 2011 0.01 883 GCGTGGGCGT 1706 RSDELTR 1808 RSDHLTT 1910RSDERKR 2012 1

[0119] TABLE 5 SEQ SEQ SEQ SEQ Kd SBS# TARGET ID F1 ID F2 ID F3 ID (nM)903 ATGGAAGGG 2013 RSDHLAR 2513 QSGNLAR 3013 RSDALRQ 3513 1.027 904AAGGGTGAC 2014 DSSNLTR 2514 QSSHLAR 3014 RSDNLTQ 3514 1 905 GTGGTGGTG2015 RSSALTR 2515 RSDSLAR 3015 RSDSLAR 3515 1.15 908 AAGGTCTCA 2016QSGDLRT 2516 DRSALAR 3016 RSDNLRQ 3516 50 909 GTGGAAGAA 2017 QSGNLSR2517 QSGNLQR 3017 RSDALAR 3517 16.4 910 ATGGAAGAT 2018 QSSNLAR 2518QSGNLQR 3018 RSDALAQ 3518 0.03 911 ATGGGTGCA 2019 QSGSLTR 2519 QSSHLAR3019 RSDALAQ 3519 0.91 912 TCAGAGGTG 2020 RSDSLAR 2520 RSDNLTR 3020QSGDLRT 3520 0.135 914 CAGGAAAAG 2021 RSDNLTQ 2521 QSGNLAR 3021 RSDNLRE3521 1.26 915 CAGGAAAAG 2022 RSDNLRQ 2522 QSGNLAR 3022 RSDNLRE 352245.15 916 GAGGAAGGA 2023 QSGHLAR 2523 QSGNLAR 3023 RSDNLQR 3523 1.3 919TCATAGTAG 2024 RSDNLTT 2524 RSDNLRT 3024 QSGDLRT 3524 250 920 GATGTGGTA2025 QSSSLVR 2525 RSDSLAR 3025 TSANLSR 3525 4 921 AAGGTCTCA 2026 QSGDLRT2526 DPGALVR 3026 RSDNLRQ 3526 11 922 AAGGTCTCA 2027 QSHDLTK 2527DRSALAR 3027 RSDNLRQ 3527 4 923 AAGGTCTCA 2028 QSHDLTK 2528 DPGALVR 3028RSDNLRQ 3528 2 926 GTGGTGGTG 2029 RSDALTR 2529 RSDSLAR 3029 RSDSLAR 35297.502 927 CAGGTTGAG 2030 RSDNLAR 2530 TSGSLTR 3030 RSDNLRE 3530 3.61 928CAGGTTGAG 2031 RSDNLAR 2531 QSSALTR 3031 RSDNLRE 3531 25 929 CAGGTAGAT2032 QSSNLAR 2532 QSATLAR 3032 RSDNLRE 3532 1.3 931 GAGGAAGAG 2033RSDNLAR 2533 QSSNLVR 3033 RSDNLAR 3533 2 932 ATGGAAGGG 2034 RSDHLAR 2534QSSNLVR 3034 RSDALRQ 3534 797 933 GACGAGGAA 2035 QSANLAR 2535 RSDNLAR3035 DRSNLTR 3535 500 934 ATGGAAGAT 2036 QSSNLAR 2536 QSGNLQR 3036RSDALTS 3536 0.07 935 ATGGGTGCA 2037 QSGSLTR 2537 QSSHLAR 3037 RSDALTS3537 0.91 937 GTGGGGGCT 2038 QSSDLTR 2538 RSDHLTR 3038 RSDSLAR 3538 0.03938 GTGGGGGCT 2039 QSSDLRR 2539 RSDHLTR 3039 RSDSLAR 3539 0.049 939GGGGGCTGG 2040 RSDHLTT 2540 DRSHLAR 3040 RSDHLSK 3540 0.352 940GGGGGCTGG 2041 RSDHLTK 2541 DRSHLAR 3041 RSDHLSK 3541 1.5 941 GGGGCTGGG2042 RSDHLAR 2542 QSSDLRR 3042 RSDKLSR 3542 0.077 942 GGGGCTGGG 2043RSDHLAR 2543 QSSDLRR 3043 RSDHLSK 3543 0.13 943 GGGGCTGGG 2044 RSDHLAR2544 TSGELVR 3044 RSDKLSR 3544 0.067 944 GGGGCTGGG 2045 RSDHLAR 2545TSGELVR 3045 RSDHLSK 3545 0.027 945 GGTGCGGTG 2046 RSDSLTR 2546 RADTLRR3046 MSHHLSR 3546 0.027 946 GGTGCGGTG 2047 RSDSLTR 2547 RSDVLQR 3047MSHHLSR 3547 0.027 947 GGTGCGGTG 2048 RSDSLTR 2548 RSDELQR 3048 QSSHLAR3548 0.013 948 GGTGCGGTG 2049 RSDSLTR 2549 RSDVLQR 3049 QSSHLAR 35490.017 962 GAGGCGGCA 2050 QSGSLTR 2550 RSDELQR 3050 RSDNLAR 3550 0.015963 GAGGCGGCA 2051 QSGSLTR 2551 RSDDLQR 3051 RSDNLAR 3551 0.015 964GCGGCGGTG 2052 RSDALAR 2552 RSDELQR 3050 RSDERKR 3552 0.041 965GCGGCGGCC 2053 ERGDLTR 2553 RSDELQR 3053 RSDERKR 3553 3.1 966 GAGGAGGCC2054 ERGTLAR 2554 RSDNLSR 3054 RSDNLAR 3554 0.028 967 GAGGAGGCC 2055DRSSLTR 2555 RSDNLSR 3055 RSDNLAR 3555 0.055 968 GAGGCCGCA 2056 QSGSLTR2556 DRSSLTR 3056 RSDNLAR 3556 1.4 969 GAGGCCGCA 2057 QSGSLTR 2557DRSDLTR 3057 RSDNLAR 3557 0.275 970 GTGGGCGCC 2058 ERGTLAR 2558 DRSHLAR3058 RSDALAR 3558 1.859 971 GTGGGCGCC 2059 DRSSLTR 2559 DRSHLAR 3059RSDALAR 3559 0.144 972 GTGGGCGCC 2060 ERGDLTR 2560 DRSHLAR 3060 RSDALAR3560 1.748 973 GCCGCGGTC 2061 DRSALTR 2561 RSDELQR 3061 ERGTLAR 3561 0.6974 GCCGCGGTC 2062 DRSALTR 2562 RSDELQR 3062 DRSDLTR 3562 0.038 975CAGGCCGCT 2063 QSSDLTR 2563 DRSSLTR 3063 RSDNLRE 3563 1.1 976 CAGGCCGCT2064 QSSDLTR 2564 DRSDLTR 3064 RSDNLRE 3564 4.12 977 CTGGCAGTG 2065RSDSLTR 2565 QSGSLTR 3065 RSDALRE 3565 0.017 978 CTGGCAGTG 2066 RSDSLTR2566 QSGDLTR 3066 RSDALRE 3566 1.576 979 CTGGCGGCG 2067 RSSDLTR 2567RSDELQR 3067 RSDALRE 3567 1.59 980 CTGGCGGCG 2068 RSDDLTR 2568 RSDELQR3068 RSDALRE 3568 2.2 981 CAGGCGGCG 2069 RSDDLTR 2569 RSDELQR 3069RSDNLRE 3569 0.375 982 CCGGGCTGG 2070 RSDHLTT 2570 DRSHLAR 3070 RSDELRE3570 0.03 983 CCGGGCTGG 2071 RSDHLTK 2571 DRSHLAR 3071 RSDELRE 35711.385 984 GACGGCGAG 2072 RSDNLAR 2572 DRSHLAR 3072 DRSNLTR 3572 1.6 985GACGGCGAG 2073 RSDNLAR 2573 DRSHLAR 3073 EKANLTR 3573 0.965 986GGTGCTGAT 2074 QSSNLQR 2574 QSSDLQR 3074 MSHHLSR 3574 1.6 987 GGTGCTGAT2075 QSSNLQR 2575 QSSDLQR 3075 TSGHLVR 3575 33.55 988 GGTGCTGAT 2076TSGNLVR 2576 QSSDLQR 3076 MSHHLSR 3576 0.15 989 GGTGAGGGG 2077 RSDHLAR2577 RSDNLAR 3077 MSHHLSR 3577 1.9 990 AAGGTGGGC 2078 DRSHLTR 2578RSDSLAR 3078 RSDNLTQ 3578 5.35 991 AAGGTGGGC 2079 DRSHLTR 2579 SSGSLVR3079 RSDNLTQ 3579 0.06 993 GGGGCTGGG 2080 RSDHLAR 2580 TSGELVR 3080RSDHLSR 3580 3.1 994 GGGGGCTGG 2081 RSDHLTK 2581 DRSHLAR 3081 RSDHLSR3581 0.03 995 GGGGAGGAA 2082 QSANLAR 2582 RSDNLAR 3082 RSDHLSK 3582 0.08996 CAGTTGGTC 2083 DRSALAR 2583 RSDALTS 3083 RSDNLRE 3583 9.6 997AGAGAGGCT 2084 QSSDLTR 2584 RSDNLAR 3084 QSGHLNQ 584 1.65 998 ACGTAGTAG2085 RSANLRT 2585 RSDNLTK 3085 RSDTLKQ 3585 0.23 999 AGAGAGGCT 2086QSSDLTR 2586 RSDNLAR 3086 QSGKLTQ 3586 0.6 1000 CAGTTGGTC 2087 DRSALAR2587 RSDALTR 3087 RSDNLRE 3587 11.15 1001 GGAGCTGAC 2088 EKANLTR 2588QSSDLSR 3088 QRAHLAR 3588 1.8 1002 GCGGAGGAG 2089 RSDNLVR 2589 RSDNLAR3089 RSDERKR 3589 0.028 1003 ACGTAGTAG 2090 RSANLRT 2590 RSDNLTK 3090RSDTLRS 3590 0.118 1004 ACGTAGTAG 2091 RSDNLTT 2591 RSDNLTK 3091 RSDTLRS3591 1.4 1006 GTAGGGGCG 2092 RSDDLTR 2592 RSDHLTR 3092 QRASLTR 35920.898 1007 GAGAGAGAT 2093 QSSNLQR 2593 QSGHLTR 3093 RLHNLAR 3593 1671008 GAGATGGAG 2094 RSDNLSR 2594 RSDSLTQ 3094 RLHNLAR 3594 0.4 1009GAGATGGAG 2095 RSDNLSR 2595 RSDSLTQ 3095 RSDNLSR 3595 1.9 1010 GAGAGAGAT2096 QSSNLQR 2596 QSGHLTR 3096 RSDNLAR 3596 8.2 1011 TTGGTGGCG 2097RSADLTR 2597 RSDSLAR 3097 RSDSLTK 3597 0.03 1012 GACGTAGGG 2098 RSDHLTR2598 QSSSLVR 3098 DRSNLTR 3598 0.032 1013 GAGAGAGAT 2099 QSSNLQR 2599QSGHLNQ 3099 RSDNLAR 3599 0.15 1014 GACGTAGGG 2100 RSDHLTR 2600 QSGSLTR3100 DRSNLTR 3600 0.01 1015 GCGGAGGAG 2101 RSDNLVR 2601 RSDNLAR 3101RSDTLKK 3601 0.008 1016 CAGTTGGTC 2102 DRSALAR 2602 RSDSLTK 3102 RSDNLRE3602 0.09 1017 CTGGATGAC 2103 EKANLTR 2603 TSGNLVR 3103 RSDALRE 36030.233 1018 GTAGTAGAA 2104 QSANLAR 2604 QSSSLVR 3104 QRASLAR 3604 7.21019 AGGGAGGAG 2105 RSDNLAR 2605 RSDNLAR 3105 RSDHLTQ 3605 0.022 1020ACGTAGTAG 2106 RSDNLTT 2606 RSDNLTK 3106 RSDTLKQ 3606 0.69 1022GAGGAGGTG 2107 RSDALAR 2607 RSDNLAR 3107 RSDNLAR 3607 0.01 1024GGGGAGGAA 2108 QSANLAR 2608 RSDNLAR 3108 RSDHLSR 3608 0.08 1025GAGGAGGTG 2109 QSSALTR 2609 QSSSLVR 3109 RSDTLTQ 3609 0.115 1026GTGGCTTGT 2110 MSHHLKE 2610 QSSDLSR 3110 RSDALAR 3610 0.076 1027GCGGCGGTG 2111 RSDALAR 2611 RSDELQR 3111 RSDELQR 3611 0.054 1032GGTGCTGAT 2112 TSGNLVR 2612 QSSDLQR 3112 TSGHLVR 3612 0.52 1033GTGTTCGTG 2113 RSDALAR 2613 DRSALTT 3113 RSDALAR 3613 685.2 1034GTGTTCGTG 2114 RSDALAR 2614 DRSALTK 3114 RSDALAR 3614 14.55 1035GTGTTCGTG 2115 RSDALAR 2615 DRSALRT 3115 RSDALAR 3615 56 1037 GTAGGGGCA2116 QSGSLTR 2616 RSDHLSR 3116 QRASLAR 3616 0.05 1038 GTAGGGGCA 2117QTGELRR 2617 RSDHLSR 3117 QRASLAR 3617 0.152 1039 GGGGCTGGG 2118 RSDHLSR2618 TSGELVR 3118 RSDHLTR 3618 1.37 1040 GGGGCTGGG 2119 RSDHLSR 2619QSSDLQR 3119 RSDHLSK 3619 0.05 1041 TCATAGTAG 2120 RSDNLTT 2620 RSDNLRT3120 QSHDLTK 3620 2.06 1043 CAGGGAGAG 2121 RSDNLAR 2621 QSGHLTR 3121RSDNLRE 3621 0.16 1044 CAGGGAGAG 2122 RSDNLAR 2622 QRAHLER 3122 RSDNLRE3622 1.07 1045 GGGGCAGGA 2123 QSGHLAR 2623 QSGSLTR 3123 RSDHLSR 36230.15 1046 GGGGCAGGA 2124 QSGHLAR 2624 QSGDLRR 3124 RSDHLSR 3624 0.091047 GGGGCAGGA 2125 QRAHLER 2625 QSGSLTR 3125 RSDHLSR 3625 24.7 1048CAGGCTGTA 2126 QSGALTR 2626 QSSDLQR 3126 RSDNLRE 3626 1.387 1049CAGGCTGTA 2127 QRASLAR 2627 QSSDLQR 3127 RSDNLRE 3627 55.6 1050CAGGCTGTA 2128 QSSSLVR 2628 QSSDLQR 3128 RSDNLRE 3628 0.125 1051GAGGCTGAG 2129 RSDNLTR 2629 QSSDLQR 3129 RSDNLVR 3629 0.02 1052TAGGACGGG 2130 RSDHLAR 2630 EKANLTR 3130 RSDNLTT 3630 0.28 1053TAGGACGGG 2131 RSDHLAR 2631 DRSNLTR 3131 RSDNLTT 3631 0.025 1054GCTGCAGGG 2132 RSDHLAR 2632 QSGSLTR 3132 QSSDLQR 3632 0.033 1055GCTGCAGGG 2133 RSDHLAR 2633 QSGSLTR 3133 TSGDLTR 3633 18.73 1056GCTGCAGGG 2134 RSDHLAR 2634 QSGSLTR 3134 QSSDLQR 3634 0.045 1057GCTGCAGGG 2135 RSDHLAR 2635 QSGDLTR 3135 TSGDLTR 3635 0.483 1058GGGGCCGCG 2136 RSDELTR 2636 DRSSLTR 3136 RSDHLSR 3636 6.277 1059GGGGCCGCG 2137 RSDELTR 2637 DRSDLTR 3137 RSDHLSR 3637 0.152 1060GCGGAGGCC 2138 ERGTLAR 2638 RSDNLAR 3138 RSDERKR 3638 0.69 1061GTTGCGGGG 2139 RSDHLAR 2639 RSDELQR 3139 QSSALTR 3639 0.165 1062GTTGCGGGG 2140 RSDHLAR 2640 RSDELQR 3140 TSGSLTR 3640 0.068 1063GTTGCGGGG 2141 RSDHLAR 2641 RSDELQR 3141 MSHALSR 3641 0.96 1064GCGGCAGTG 2142 RSDALTR 2642 QSGSLTR 3142 RSDERKR 3642 0.453 1065TGGGGCGGG 2143 RSDHLAR 2643 DRSHLAR 3143 RSDHLTT 3643 1.37 1066GAGGGCGGT 2144 QSSHLTR 2644 DRSHLAR 3144 RSDNLVR 3644 0.15 1067GAGGGCGGT 2145 TSGHLVR 2645 DRSHLAR 3145 RSDNLVR 3645 1.37 1068GCAGGGGGC 2146 DRSHLTR 2646 RSDHLTR 3146 QSGDLTR 3646 2.05 1069GCAGGCGGT 2147 DRSHLTR 2647 RSDHLTR 3147 QSGSLTR 3647 0.1 1070 GGGGCAGGC2148 DRSHLTR 2648 QSGSLTR 3148 RSDHLSR 3648 0.456 1071 GGGGCAGGC 2149DRSHLTR 2649 QSGDLTR 3149 RSDHLSR 3649 0.2 1072 GGATTGGCT 2150 QSSDLTR2650 RSDALTT 3150 QRAHLAR 3650 0.46 1073 GGATTGGCT 2151 QSSDLTR 2651RSDALTK 3151 QRAHLAR 3651 1.37 1075 GTGTTGGCG 2152 RSDELTR 2652 RSDALTK3152 RSDALTR 3652 0.915 1076 GCGGCAGCG 2153 RSDELTR 2653 QSGSLTR 3153RSDERKR 3653 4.1 1077 GCGGCAGCG 2154 RSDELTR 2654 QSGDLRR 3154 RSDERKR3654 6.2 1078 GGGGGGGCC 2155 ERGTLAR 2655 RSDHLSR 3155 RSDHLSR 3655 0.21079 GGGGGGGCC 2156 ERGDLTR 2656 RSDHLSR 3156 RSDHLSR 3656 4.1 1080CTGGAGGCG 2157 RSDELTR 2657 RSDNLAR 3157 RSDALRE 3657 1.37 1081GGGGAGGTG 2158 RSDALTR 2658 RSDNLTR 3158 RSDHLSR 3658 0.05 1082CTGGCGGCG 2159 RSDELTR 2659 RSDELTR 3159 RSDALRE 3659 0.152 1083CTGGTGGCA 2160 QSGDLTR 2660 RSDALSR 3160 RSDALRE 3660 0.152 1084GGTGAGGCG 2161 RSDELTR 2661 RSDNLAR 3161 MSHHLSR 3661 0.5 1085 GGTGAGGCG2162 RSDELTR 2662 RSDNLAR 3162 QSSHLAR 3662 0.46 1086 GGGGCTGGG 2163RSDHLSR 2663 QSSDLQR 3163 RSDHLTR 3663 0.1 1087 CGGGCGGCC 2164 ERGDLTR2664 RSDELQR 3164 RSDHLAE 3664 1.24 1088 CGGGCGGCC 2165 ERGDLTR 2665RSDELQR 3165 RSDHLRE 3665 0.905 1089 GACGAGGCT 2166 QSSDLRR 2666 RSDNLAR3166 DRSNLTR 3666 0.171 1090 AAGGCGCTG 2167 RSDALRE 2667 RSDELQR 3167RSDNLTQ 3667 30.3 1091 GTAGAGGAC 2168 DRSNLTR 2668 RSDNLAR 3168 QRASLAR3668 0.085 1092 GCCTTGGCT 2169 QSSDLRR 2669 RGDALTS 3169 DRSDLTR 36692.735 1093 GCGGAGTCG 2170 RSADLRT 2670 RSDNLAR 3170 RSDERKR 3670 0.0461094 GCGGTTGGT 2171 TSGHLVR 2671 QSSALTR 3171 RSDERKR 3671 12.34 1095GGGGGAGCC 2172 ERGDLTR 2672 QRAHLER 3172 RSDHLSR 3672 0.395 1096GGGGGAGCC 2173 DRSSLTR 2673 QRAHLER 3173 RSDHLSR 3673 0.019 1097GAGGCCGAA 2174 QSANLAR 2674 DCRDLAR 3174 RSDNLAR 3674 0.77 1098GCCGGGGAG 2175 RSDNLTR 2675 RSDHLTR 3175 DRSDLTR 3675 0.055 1099GCGGAGTCG 2176 TSGHLVR 2676 TSGSLTR 3176 RSDERKR 3676 0.45 1100GTGTTGGTA 2177 QSGALTR 2677 RGDALTS 3177 RSDALTR 3677 1.4 1101 ATGGGAGTT2178 TTSALTR 2678 QPAHLER 3178 RSDALRQ 3678 0.065 1102 AAGGCAGAA 2179QSANLAR 2679 QSGSLTR 3179 RSDNLTQ 3679 8.15 1103 AAGGCAGAA 2180 QSANLAR2680 QSGDLTR 3180 RSDNLTQ 3680 1.4 1104 CGGGCAGCT 2181 QSSDLRR 2681QSGSLTR 3181 RSDHLRE 3681 0.08 1105 CTGGCAGCC 2182 ERGDLTR 2682 QSGDLTR3182 RSDALRE 3682 2.45 1106 CTGGCAGCC 2183 DRSSLTR 2683 QSGDLTR 3183RSDALRE 3683 0.19 1107 GCGGGAGTT 2184 QSSALAR 2684 QRAHLER 3184 RSDERKR3684 0.06 1108 CAGGCTGGA 2185 QSGHLAR 2685 TSGELVR 3185 RSDNLRE 36850.007 1109 AGGGGAGCC 2186 ERGDLTR 2686 QRAHLER 3186 RSDHLTQ 3686 0.3471110 AGGGGAGCC 2187 DRSSLTR 2687 QRAHLER 3187 RSDHLTQ 3687 0.095 1111CTGGTAGGG 2188 RSDHLAR 2688 QSSSLVR 3188 RSDALRE 3688 0.095 1112CTGGTAGGG 2189 RSDHLAR 2689 QSATLAR 3189 RSDALRE 3689 0.125 1113CTGGGGGCA 2190 QSGDLTR 2690 RSDHLTR 3190 RSDALRE 3690 0.06 1114CAGGTTGAT 2191 QSSNLAR 2691 TSGSLTR 3191 RSDNLRE 3691 2.75 1115CAGGTTGAT 2192 QSSNLAR 2692 QSSALTR 3192 RSDNLRE 3692 0.7 1116 CCGGAAGCG2193 RSDELTR 2693 QSSNLVR 3193 RSDELRE 3693 12.3 1117 GCAGCGCAG 2194RSSNLRE 2694 RSDELTR 3194 QSGSLTR 3694 2.85 1118 TAGGGAGTC 2195 DRSALTR2695 QRAHLER 3195 RSDNLTT 3695 1.4 1119 TGGGAGGGT 2196 TSGHLVR 2696RSDNLAR 3196 RSDHLTT 3696 0.1 1120 AGGGACGCG 2197 RSDELTR 2697 DRSNLTR3197 RSDHLTQ 3697 2.735 1121 CTGGTGGCC 2198 ERGDLTR 2698 RSDALTR 3198RSDALRE 3698 2.76 1122 CTGGTGGCC 2199 DRSSLTR 2699 RSDALTR 3199 RSDALRE3699 0.101 1123 TAGGAAGCA 2200 QSGSLTR 2700 QSGNLAR 3200 RSDNLTT 37000.065 1124 GTGGATGGA 2201 QSGHLAR 2701 TSGNLVR 3201 RSDALTR 3701 0.1011126 TTGGCTATG 2202 RSDALTS 2702 TSGELVR 3202 RGDALTS 3702 0.46 1127CAGGGGGTT 2203 QSSALAR 2703 RSDHLTR 3203 RSDNLRE 3703 0.1 1128 AAGGTCGCC2204 ERGDLTR 2704 DPGALVR 3204 RSDNLTQ 3704 5.45 1130 GGTGCAGAC 2205DRSNLTR 2705 QSGDLTR 3205 MSHHLSR 3705 0.1 1131 GTGGGAGCC 2206 ERGDLTR2706 QRAHLER 3206 RSDALTR 3706 0.95 1132 GGGGCTGGA 2207 QSGHLAR 2707TSGELVR 3207 RSDHLSR 3707 0.055 1133 GGGGCTGGA 2208 QRAHLER 2708 TSGELVR3208 RSDHLSR 3708 0.5 1134 TGGGGGTGG 2209 RSDHLTT 2709 RSDHLTR 3209RSDHLTT 3709 0.067 1135 GCGGCGGGG 2210 RSDHLAR 2710 RSDELQR 3210 RSDERKR3710 0.025 1136 CCGGGAGTG 2211 RSDALTR 2711 QRAHLER 3211 RSDTLRE 37110.225 1137 CCGGGAGTG 2212 RSSALTR 2712 QRAHLER 3212 RSDTLRE 3712 0.0851138 CAGGGGGTA 2213 QSGALTR 2713 RSDHLTR 3213 RSDNLRE 3713 0.027 1139ACGGCCGAG 2214 RSDNLAR 2714 DRSDLTR 3214 RSDTLTQ 3714 0.535 1140AAGGGTGCG 2215 RSDELTR 2715 QSSHLAR 3215 RSDNLTQ 3715 0.3 1141 ATGGACTTG2216 RGDALTS 2716 DRSNLTR 3216 RSDALTQ 3716 1.7 1148 TTGGAGGAG 2217RSDNLTR 2717 RSDNLTR 3217 RGDALTS 3717 0.006 1149 TTGGAGGAG 2218 RSDNLTR2718 RSDNLTR 3218 RSDALTK 3718 0.004 1150 GAAGAGGCA 2219 QSGSLTR 2719RSDNLTR 3219 QSGNLTR 3719 0.004 1151 GTAGTATGG 2220 RSDHLTT 2720 QRSALAR3220 QRASLAR 3720 1.63 1152 AAGGCTGGA 2221 QSGHLAR 2721 TSGELVR 3221RSDNLTQ 3721 1.605 1153 AAGGCTGGA 2222 QRAHLAR 2722 TSGELVR 3222 RSDNLTQ3722 8.2 1154 CTGGCGTAG 2223 RSDNLTT 2723 RSDELQR 3223 RSDALRE 3723 1.041156 ATGGTTGAA 2224 QSANLAR 2724 QSSALTR 3224 RSDALRQ 3724 7.2 1157ATGGTTGAA 2225 QSANLAR 2725 TSGSLTR 3225 RSDALRQ 3725 0.885 1158AGGGGAGAA 2226 QSANLAR 2726 QSGHLTR 3226 RSDHLTQ 3726 0.1 1159 AGGGGAGAA2227 QSANLAR 2727 QRAHLER 3227 RSDHLTQ 3727 0.555 1160 TGGGAAGGC 2228DRSHLAR 2728 QSSNLVR 3228 RSDHLTT 3728 0.415 1161 GAGGCCGGC 2229 DRSHLAR2729 DRSDLTR 3229 RSDNLAR 3729 0.45 1162 GTGTTGGTA 2230 QSGALTR 2730RADALMV 3230 RSDALTR 3730 0.465 1163 GTGTGAGCC 2231 ERGDLTR 2731 QSGHLTT3231 RSDALTR 3731 1.45 1164 GTGTGAGCC 2232 ERGDLTR 2732 QSVHLQS 3232RSDALTR 3732 15.4 1165 GCGAAGGTG 2233 RSDALTR 2733 RSDNLTQ 3233 RSDERKR3733 1.4 1166 GCGAAGGTG 2234 RSDALTR 2734 RSDNLTQ 3234 RSSDRKR 37340.195 1167 GCGAAGGTG 2235 RSDALTR 2735 RSDNLTQ 3235 RSHDRKR 3735 0.951168 AAGGCGCTG 2236 RSDALRE 2736 RSSDLTR 3236 RSDNLTQ 3736 2.8 1169GTAGAGGAC 2237 DRSNLTR 2737 RSDNLAR 3237 QSSSLVR 3737 0.053 1170GCCTTGGCT 2238 QSSDLRR 2738 RADALMV 3238 DRSDLTR 3738 2.75 1171GCGGAGTCG 2239 RSDDLRT 2739 RSDNLAR 3239 RSDERKR 3739 0.18 1172GCCGGGGAG 2240 RSDNLTR 2740 RSDHLTR 3240 ERGDLTR 3740 0.01 1173GCTGAAGGG 2241 RSDHLSR 2741 QSGNLAR 3241 QSSDLRR 3741 0.008 1174GCTGAAGGG 2242 RSDHLSR 2742 QSSNLVR 3242 QSSDLRR 3742 0.018 1175AAGGTCGCC 2243 DRSDLTR 2743 DPGALVR 3243 RSDNLTQ 3743 8.9 1176 GTGGGAGCC2244 DRSDLTR 2744 QPAHLER 3244 RSDALTR 3744 4.1 1177 CCGGGCGCA 2245QSGSLTR 2745 DRSHLAR 3245 RSDTLRE 3745 4.1 1178 GAGGATGGC 2246 DRSHLAR2746 TSGNLVR 3246 RSDNLAR 3746 0.085 1179 GCAGCGCAG 2247 RSSNLRE 2747RSSDLTR 3247 QSGSLTR 3747 2.735 1180 AAGGAAAGA 2248 QSGHLNQ 2748 QSGNLAR3248 RSDNLTQ 3748 4.825 1181 TTGGCTATG 2249 RSDALRQ 2749 TSGELVR 3249RGDALTS 3749 8.2 1182 CAGGAAGGC 2250 DRSHLAR 2750 QSGNLAR 3250 RSDNLRE3750 1.48 1183 CAGGAAGGC 2251 DRSHLAR 2751 QSSNLVR 3251 RSDNLRE 37511.935 1184 AAGGAAAGA 2252 KNWKLQA 2752 QSGNLAR 3252 RSDNLTQ 3752 2.7851185 AAGGAAAGA 2253 KNWKLQA 2753 QSHNLAR 3253 RSDNLTQ 3753 5.25 1186GCCGAGGTG 2254 RSDSLLR 2754 RSKNLQR 3254 ERGTLAR 3754 27.5 1187CTGGTGGGC 2255 DRSHLAR 2755 RSDALTR 3255 RSDALRE 3755 0.006 1188GTAGTATGG 2256 RSDHLTT 2756 QSSSLVR 3256 QRASLAR 3756 2.74 1189ATGGTTGAA 2257 QSANLAR 2757 TSGALTR 3257 RSDALRQ 3757 1.51 1190ATGGCAGTG 2258 RSDALTR 2758 QSGDLTR 3258 RSDSLNQ 3758 1.484 1191ATGGCAGTG 2259 RSDALTR 2759 QSGSLTR 3259 RSDSLNQ 3759 5.325 1192ATGGCAGTG 2260 RSDALTR 2760 QSGDLTR 3260 RSDALTQ 3760 2.364 1193ATGGCAGTG 2261 RSDALTR 2761 QSGSLTR 3261 RSDALTQ 3761 3.125 1194GAGAAGGTG 2262 RSDALTR 2762 RSDNRTA 3262 RSDNLTR 3762 2.19 1195GAGAAGGTG 2263 RSDALTR 2763 RSDNRTA 3263 RSSNLTR 3763 2.8 1197 GAAGGTGCC2264 ERGDLTR 2764 MSHHLSR 3264 QSGNLTR 3764 14.8 1199 ATGGAGAAG 2265RSDNRTA 2765 RSDNLTR 3265 RSDALTQ 3765 3.428 1200 ATGGAGAAG 2266 RSDNRTA2766 RSSNLTR 3266 RSDALTQ 3766 16.87 1201 ATGGAGAAG 2267 RSDNRTA 2767RSHNLTR 3267 RSDALTQ 3767 14.8 1202 CTGGAGTAC 2268 DRSNLRT 2768 RSDNLTR3268 RSDALRE 3768 2.834 1203 GGAGTACTG 2269 RSDALRE 2769 QRSALAR 3269QRAHLAR 3769 2.945 1204 GGAGTACTG 2270 RSDALRE 2770 QSSSLVR 3270 QRAHLAR3770 4.38 1205 CGGGCAGCT 2271 QSSDLRR 2771 QSGDLTR 3271 RSDHLRE 3771 0.91206 GCGGGAGTT 2272 TTSALTR 2772 QRAHLER 3272 RSDERKR 3772 0.034 1207CAGGCTGGA 2273 QRAHLER 2773 TSGELVR 3273 RSDNLRE 3773 0.45 1209CCGGAAGCG 2274 RSDELTR 2774 QSSNLVR 3274 RSDTLRE 3774 19.28 1211GCAGCGCAG 2275 RSDNLRE 2775 RSDELTR 3275 QSGSLTR 3775 6.5 1212 CAGGGGGTT2276 TTSALTR 2776 RSDHLTR 3276 RSDNLRE 3776 0.05 1213 GAAGAAGAG 2277RSDNLTR 2777 QSSNLVR 3277 QSGNLTR 3777 12.3 1214 ATGGGAGTT 2278 TTSALTR2778 QRAHLER 3278 RSDALTQ 3778 0.46 1215 GTGGGGGCT 2279 QSSDLRR 2779RSDHLTR 3279 RSDALTR 3779 0.003 1217 GAAGAGGCA 2280 QSGSLTR 2780 RSDNLTR3280 QSANLTR 3780 0.004 1218 GCGGTGAGG 2281 RSDHLTQ 2781 RSQALTR 3281RSDERKR 3781 0.46 1219 AAGGAAAGG 2282 RSDHLTQ 2782 QSHNLAR 3282 RSDNLTQ3782 0.68 1220 AAGGAAAGG 2283 RSDHLTQ 2783 QSGNLAR 3283 RSDNLTQ 37830.175 1221 AAGGAAAGG 2284 RSDHLTQ 2784 QSSNLVR 3284 RSDNLTQ 3784 1.41222 CAGGAGGGC 2285 DRSHLAR 2785 RSDNLAR 3285 RSDNLRE 3785 0.155 1223ATGGACTTG 2286 RSDALTK 2786 DRSNLTR 3286 RSDALTQ 3786 7 1224 ATGGACTTG2287 RADALMV 2787 DRSNLTR 3287 RSDALTQ 3787 12 1227 GAATAGGGG 2288RSDHLSR 2788 RSDHLTK 3288 QSGNLAR 3788 25 1228 ACGGCCGAG 2289 RSDNLAR2789 DRSDLTR 3289 RSDDLTQ 3789 12 1229 AAGGGTGCG 2290 RSDELTR 2790MSHHLSR 3290 RSDNLTQ 3790 8.2 1230 AAGGGAGAC 2291 DRSNLTR 2791 QSGHLTR3291 RSDNLTQ 3791 0.383 1231 AAGGGAGAC 2292 DRSNLTR 2792 QRAHLER 3292RSDNLTQ 3792 0.213 1232 TGGGACCTG 2293 RSDALRE 2793 DRSNLTR 3293 RSDHLTT3793 0.113 1233 TGGGACCTG 2294 RSDALRE 2794 DRSNLTR 3294 RSDHLTT 37940.635 1234 GAGTAGGCA 2295 QSGSLTR 2795 RSDNLTK 3295 RSDNLAR 3795 0.1011236 GAGTAGGCA 2296 QSGSLTR 2796 RSDHLTT 3296 RSDNLAR 3796 0.065 1237GAAGGAGAG 2297 RSDNLAR 2797 QRAHLER 3297 QSGNLAR 3797 0.065 1238CTGGATGTT 2298 QSSALAR 2798 TSGNLVR 3298 RSDALRE 3798 0.313 1239CAGGACGTG 2299 RSDALTR 2799 DPGNLVR 3299 RSDNLKD 3799 0.144 1240GGGGAGGCA 2300 QSGSLTR 2800 RSDNLTR 3300 RSDHLSR 3800 0.056 1241GAGGTGTCA 2301 QSHDLTK 2801 RSDALAR 3301 RSDNLAR 3801 0.027 1242GGGGTTGAA 2302 QSANLAR 2802 TSGSLTR 3302 RSDHLSR 3802 0.02 1243GGGGTTGAA 2303 QSANLAR 2803 QSSALTR 3303 RSDHLSR 3803 0.101 1244GTCGCGGTG 2304 RSDALTR 2804 RSDELQR 3304 DRSALAR 3804 0.044 1245GTCGCGGTG 2305 RSDALTR 2805 RSDELQR 3305 DSGSLTR 3805 0.102 1246GTGGTTGCG 2306 RSDELTR 2806 TSGSLTR 3306 RSDALTR 3806 0.051 1247GTGGTTGCG 2307 RSDELTR 2807 TSGALTR 3307 RSDALTR 3807 0.117 1248GTCTAGGTA 2308 QSGALTR 2808 RSDNLTT 3308 DRSALAR 3808 5.14 1249CCGGGAGCG 2309 RSDELTR 2809 QSGHLTR 3309 RSDTLRE 3809 0.26 1250GAAGGAGAG 2310 RSDNLAR 2810 QSGHLTR 3310 QSGNLAR 3810 0.31 1252CCGGCTGGA 2311 QRAHLER 2811 QSSDLTR 3311 RSDTLRE 3811 0.153 1253CCGGGAGCG 2312 RSDELTR 2812 QRAHLER 3312 RSDTLRE 3812 0.228 1255ACGTAGTAG 2313 RSDNLTT 2813 RSDNLTK 3313 RSDTLKQ 3813 0.69 1256GGGGAGGAT 2314 QSSNLAR 2814 RSDNLQR 3314 RSDHLSR 3814 2 1257 GGGGAGGAT2315 TTSNLAR 2815 RSDNLQR 3315 RSDHLSR 3815 1 1258 GGGGAGGAT 2316QSSNLRR 2816 RSDNLQR 3316 RSDHLSR 3816 2 1259 GAGTGTGTG 2317 RSDSLLR2817 DRDHLTR 3317 RSDNLAR 3817 1.5 1260 GAGTGTGTG 2318 RLDSLLR 2818DRDHLTR 3318 RSDNLAR 3818 1.8 1261 TGCGGGGCA 2319 QSGDLTR 2819 RSDHLTR3319 RRDTLHR 3819 0.2 1262 TGCGGGGCA 2320 QSGDLTR 2820 RSDHLTR 3320RLDTLGR 3820 3 1263 TGCGGGGCA 2321 QSGDLTR 2821 RSDHLTR 3321 DSGHLAS3821 21 1264 AAGTTGGTT 2322 TTSALTR 2822 RADALMV 3322 RSDNLTQ 3822 0.211265 AAGTTGGTT 2323 TTSALTR 2823 RSDALTT 3323 RSDNLTQ 3823 0.077 1266CAGGGTGGC 2324 DRSHLTR 2824 QSSHLAR 3324 RSDNLRE 3824 6.1 1267 TAGGCAGTC2325 DRSALTR 2825 QSGSLTR 3325 RSDNLTT 3825 6 1268 CTGTTGGCT 2326QSSDLTR 2826 RAEDLMV 3326 RSDALRE 3826 1.52 1269 CTGTTGGCT 2327 QSSDLTR2827 RSDALTT 3327 RSDALRE 3827 12.3 1270 TTGGATGGA 2328 QSGHLAR 2828TSGNLVR 3328 RSDALTK 3828 0.4 1271 GTGGCACTG 2329 RSDALRE 2829 QSGSLTR3329 RSDALTR 3829 0.915 1272 CAGGAGTCC 2330 DRSSLTT 2830 RSDNLAR 3330RSDNLRE 3830 0.04 1273 CAGGAGTCC 2331 ERGDLTT 2831 RSDNLAR 3331 RSDNLRE3831 0.1 1274 GCATGGGAA 2332 QSANLSR 2832 RSDHLTT 3332 QSGSLTR 38320.306 1275 GCATGGGAA 2333 QRSNLVR 2833 RSDHLTT 3333 QSGSLTR 3833 0.3261276 TAGGAAGAG 2334 RSDNLAR 2834 QRSNLVR 3334 RSDNLTT 3834 0.685 1277GAAGAGGGG 2335 RSDHLAR 2835 RSDNLAR 3335 QSGNLTR 3835 0.421 1278GAGTAGGCA 2336 QSGSLTR 2836 RSDNLRT 3336 RSDNLAR 3836 0.019 1279GAGGTGTCA 2337 QSGDLRT 2837 RSDALAR 3337 RSDNLAR 3837 0.025 1282TCGGTCGCC 2338 ERGDLTR 2838 DPGALVR 3338 RSDELRT 3838 74.1 1287GTGGTAGGA 2339 QSGHLAR 2839 QSGALAR 3339 RSDALTR 3839 0.152 1288CAGGGTGGC 2340 DRSHLTR 2840 QSSHLAR 3340 RSDNLTE 3840 4.1 1289 TAGGCAGTC2341 DRSALTR 2841 QSGSLTR 3341 RSDNLTK 3841 1.37 1290 GTGGTGATA 2342QSGALTQ 2842 RSHALTR 3342 RSDALTR 3842 24.05 1291 GTGGTGATA 2343 QQASLNA2843 RSHALTR 3343 RSDALTR 3843 20.55 1292 TTGGATGGA 2344 QSGHLAR 2844TSGNLVR 3344 RSDALTT 3844 4.12 1293 AAGGTAGGT 2345 TSGHLVR 2845 QSGALAR3345 RSDNLTQ 3845 0.457 1294 AAGGTAGGT 2346 MSHHLSR 2846 QSGALAR 3346RSDNLTQ 3846 2.75 1295 CAGGAGTCC 2347 DRSSLTT 2847 RSDNLAR 3347 RSDNLTE3847 0.116 1296 CAGGAGTCC 2348 ERGDLTT 2848 RSDNLAR 3348 RSDNLTE 3848 371297 TAGGAAGAG 2349 RSDNLAR 2849 QRSNLVR 3349 RSDNLTK 3849 0.05 1298CAGGACGTG 2350 RSDLATR 2850 DPGNLVR 3350 RSDNLTE 3850 0.05 1300GTCTAGGTA 2351 QSGALTR 2851 RSDNLTK 3351 DRSALAR 3851 0.46 1302CCGGCTGGA 2352 QSGHLTR 2852 QSSDLTR 3352 RSDTLRE 3852 0.05 1303TAGGAGTTT 2353 QRSALAS 2853 RSDNLAR 3353 RSDNLTT 3853 0.088 1306CTGGCCTTG 2354 RSDALTT 2854 DCRDLAR 3354 RSDALRE 3854 2.285 1308TGGGCAGCC 2355 ERGTLAR 2855 QSGSLTR 3355 RSDHLTT 3855 0.305 1309TAGGAGTTT 2356 QSSALAS 2856 RSDNLAR 3356 RSDNLTT 3856 0.184 1310TAGGAGTTT 2357 TTSALAS 2857 RSDNLAR 3357 RSDNLTT 3857 0.075 1311TGGGCAGCC 2358 ERGDLAR 2858 QSGSLTR 3358 RSDHLTT 3858 0.91 1312GGGGCGTGA 2359 QSGHLTK 2859 RSDELQR 3359 RSDHLSR 3859 0.23 1313GGGGCGTGA 2360 QSGHLTT 2860 RSDELQR 3360 RSDHLSR 3860 0.09 1314GTACAGTAG 2361 RSDNLTT 2861 RSDNLRE 3361 QSSSLVR 3861 3.09 1315GTACAGTAG 2362 RSDNLTT 2862 RSDNLTE 3362 QSSSLVR 3862 9.27 1318ATGGTGTGT 2363 TSSHLAS 2863 RSDALAR 3363 RSDALAQ 3863 0.048 1319ATGGTGTGT 2364 MSHHLTT 2864 RSDALAR 3364 RSDALAQ 3864 0.228 1320TTGGGAGAG 2365 RSDNLAR 2865 QRAHLER 3365 RSDALTT 3865 0.044 1321TTGGGAGAG 2366 RSDNLAR 2866 QRAHLER 3366 RADALMV 3866 0.127 1322GTGGGAATA 2367 QSGALTQ 2867 QSGHLTR 3367 RSDALTR 3867 0.799 1323GTGGGAATA 2368 QLTGLNQ 2868 QSGHLTR 3368 RSDALTR 3868 0.744 1324GTGGGAATA 2369 QQASLNA 2869 QSHHLTR 3369 RSDALTR 3869 18.52 1325TTGGTTGGT 2370 TSGHLVR 2870 TSGSLTR 3370 RSDALTK 3870 0.306 1326TTGGTTGGT 2371 TSGHLVR 2871 QSSALTR 3371 RSDALTK 3871 4.385 1327TTGGTTGGT 2372 TSGHLVR 2872 TSGSLTR 3372 RSDALTT 3872 0.566 1328TTGGTTGGT 2373 TSGHLVR 2873 QSSALTR 3373 RSDALTT 3873 7.95 1329CTGGCCTGG 2374 RSDHLTT 2874 DRSDLTR 3374 RSDALRE 3874 0.68 1330GAGGTGTGA 2375 QSGHLTT 2875 RSDALTR 3375 RSDNLAR 3875 0.175 1331CTGGCCTGG 2376 RSDHLTT 2876 DCRDLAR 3376 RSDALRE 3876 0.388 1334CCGGCGCTG 2377 RSDALRE 2877 RSSDLTR 3377 RSDDLRE 3877 0.31 1335GACGCTGGC 2378 DRSHLTR 2878 QSSDLTR 3378 DSSNLTR 3878 1.4 1336 CGGGCTGGA2379 QSGHLAR 2879 QSSDLTR 3379 RSDHLAE 3879 1.4 1337 CGGGCTGGA 2380QSSHLAR 2880 QSSDLTR 3380 RSDHLAE 3880 0.235 1338 GGGATGGCG 2381 RSDELTR2881 RSDALTQ 3381 RSDHLSR 3881 1.04 1339 GGGATGGCG 2382 RSDELTR 2882RSDSLTQ 3382 RSDHLSR 3882 0.569 1340 GGGATGGCG 2383 RSDELTR 2883 RSDALTQ3383 RSHHLSR 3883 0.751 1341 GGGATGGCG 2384 RSDELTR 2884 RSDSLTQ 3384RSHHLSR 3884 4.1 1342 CAGGCGCAG 2385 RSDNLRE 2885 RSSDLTR 3385 RSDNLTE3885 0.68 1343 CAGGCGCAG 2386 RSDNLTT 2886 RTSTLTR 3386 RSDNLTE 388637.04 1344 CCGGGCGAC 2387 DRSNLTR 2887 DRSHLAR 3387 RSDTLRE 3887 2.281346 GATGTGTGA 2388 QSGHLTT 2888 RSDALAR 3388 TSANLSR 3888 0.153 1347CAGTGAATG 2389 RSDALTS 2889 QSHHLTT 3389 RSDNLTE 3889 8.23 1348GGGTCACTG 2390 RSDALTA 2890 QAATLTT 3390 RSDHLSR 3890 2.58 1350CAGTGAATG 2391 RSDALTQ 2891 QSGHLTT 3391 RSDNLTE 3891 74.1 1351GGGTCACTG 2392 RSDALRE 2892 QSHDLTK 3392 RSDHLSR 3892 0.234 1352GTGTGGGTC 2393 DRSALAR 2893 RSDHLTT 3393 RSDALTR 3893 0.023 1353CTGGCGAGA 2394 QSGHLNQ 2894 RSDELQR 3394 RSDALRE 3894 56.53 1354CTGGCGAGA 2395 KNWKLQA 2895 RSDELQR 3395 RSDALRE 3895 20.85 1355GCTTTGGCA 2396 QSGSLTR 2896 RSDALTT 3396 QSSDLTR 3896 0.172 1356GCTTTGGCA 2397 QSGSLTR 2897 RADALMV 3397 QSSDLTR 3897 0.034 1357GACTTGGTA 2398 QSSSLVR 2898 RSDALTT 3398 DRSNLTR 3898 0.032 1358GACTTGGTA 2399 QSSSLVR 2899 RADALMV 3399 DRSNLTR 3899 0.05 1360CAGTTGTGA 2400 QSGHLTT 2900 RADALMV 3400 RSDNLTE 3900 41.7 1361AAGGAAAAA 2401 QKTNLDT 2901 QSGNLQR 3401 RSDNLTQ 3901 0.835 1362AAGGAAAAA 2402 QSGNLNQ 2902 QSGNLQR 3402 RSDNLTQ 3902 0.332 1363AAGGAAAAA 2403 QKTNLDT 2903 QRSNLVR 3403 RSDNLTQ 3903 74.1 1364ATGGGTGAA 2404 QSANLSR 2904 QSSHLAR 3404 RSDALAQ 3904 1.22 1365ATGGGTGAA 2405 QRSNLVR 2905 QSSLLAR 3405 RSDALAQ 3905 0.152 1366ATGGGTGAA 2406 QSANLSR 2906 TSGHLVR 3406 RSDALAQ 3906 22.63 1367ATGGGTGAA 2407 QRSNLVR 2907 TSGHLVR 3407 RSDALAQ 3907 1.028 1368CTGGGAGAT 2408 QSSNLAR 2908 QRAHLER 3408 RSDALRE 3908 0.051 1369CTGGGAGAT 2409 QSSNLAR 2909 QSGHLTR 3409 RSDALRE 3909 0.227 1373GTGGTGGGC 2410 DRSHLTR 2910 RSDALSR 3410 RSDALTR 3910 0.025 1374CCGGCGGTG 2411 RSDALTR 2911 RSDELQR 3411 RSDELRE 3911 0.003 1375CCGGCGGTG 2412 RSDALTR 2912 RSDDLQR 3412 RSDELRE 3912 0.008 1376CCGGCGGTG 2413 RSDALTR 2913 RSDERKR 3413 RSDELRE 3913 0.858 1377CCGGCGGTG 2414 RSDALTR 2914 RSDELQR 3414 RSDDLRE 3914 0.012 1378CCGGCGGTG 2415 RSDALTR 2915 RSDDLQR 3415 RSDDLRE 3915 0.012 1379CCGGCGGTG 2416 RSDALTR 2916 RSDERKR 3416 RSDDLRE 3916 0.25 1380GCCGACGGT 2417 QSSHLTR 2917 DRSNLTR 3417 ERGDLTR 3917 0.076 1381GCCGACGGT 2418 QSSHLTR 2918 DPGNLVR 3418 ERGDLTR 3918 0.23 1382GCCGACGGT 2419 QSSHLTR 2919 DRSNLTR 3419 DCRDLAR 3919 3.1 1383 GCCGACGGT2420 QSSHLTR 2920 DPGNLVR 3420 DCRDLAR 3920 1.74 1384 GGTGTGGGC 2421DRSHLTR 2921 RSDALSR 3421 MSHHLSR 3921 0.013 1385 TGGGCAAGA 2422 QSGHLNQ2922 QSGSLTR 3422 RSDHLTT 3922 0.229 1386 TGGGCAAGA 2423 ENWKLQA 2923QSGSLTR 3423 RSDHLTT 3923 0.193 1389 CTGGCCTGG 2424 RSDHLTT 2924 DCRDLAR3424 RSDALRE 3924 0.175 1393 TGGGAAGCT 2425 QSSDLRR 2925 QSGNLAR 3425RSDHLTT 3925 0.1 1394 TGGGAAGCT 2426 QSSDLRR 2926 QSGNLAR 3426 RSDHLTK3926 0.04 1395 GAAGAGGGA 2427 QSGHLQR 2927 RSDNLAR 3427 QSGNLAR 39270.025 1396 GAAGAGGGA 2428 QRAHLAR 2928 RSDNLAR 3428 QSGNLAR 3928 0.1071397 GAAGAGGGA 2429 QSSHLAR 2929 RSDNLAR 3429 QSGNLAR 3929 0.14 1398TAATGGGGG 2430 RSDHLSR 2930 RSDHLTT 3430 QSGNLRT 3930 0.065 1399TGGGAGTGT 2431 TKQHLKT 2931 RSDNLAR 3431 RSDHLTT 3931 0.1 1400 CCGGGTGAG2432 RSDNLAR 2932 QSSHLAR 3432 RSDDLRE 3932 0.371 1401 GAGTTGGCC 2433ERGTLAR 2933 RADALMV 3433 RSDNLAR 3933 0.167 1402 CTGGAGTTG 2434 RGDALTS2934 RSDNLAR 3434 RSDALRE 3934 0.15 1403 ATGGCAATG 2435 RSDALTQ 2935QSGSLTR 3435 RSDALTQ 3935 0.07 1404 GAGGCAGGG 2436 RSDHLSR 2936 QSGSLTR3436 RSDNLAR 3936 0.022 1405 GAGGCAGGG 2437 RSDHLSR 2937 QSGDLTR 3437RSDNLAR 3937 0.045 1406 GAAGCGGAG 2438 RSDNLAR 2938 RSDELTR 3438 QSGNLAR3938 0.025 1407 GCGGGCGCA 2439 QSGSLTR 2939 DRSHLAR 3439 RSDERKR 39390.585 1408 CCGGCAGGG 2440 RSDHLSR 2940 QSGSLTR 3440 RSDELRE 3940 0.3051409 CCGGCAGGG 2441 RSDHLSR 2941 QSGSLTR 3441 RSDDLRE 3941 0.153 1410CCGGCGGCG 2442 RSDELTR 2942 RSDELQR 3442 RSDELRE 3942 0.814 1411TGAGGCGAG 2443 RSDNLAR 2943 DRSHLAR 3443 QSGHLTK 3943 0.282 1412CTGGCCGTG 2444 RSDSLLR 2944 ERGTLAR 3444 RSDALRE 3944 0.172 1413CTGGCCGCG 2445 RSDELTR 2945 DRSDLTR 3445 RSDALRE 3945 0.152 1414CTGGCCGCG 2446 RSDELTR 2946 ERGTLAR 3446 RSDALRE 3946 0.914 1415GCGGCCGAG 2447 RSDNLAR 2947 DRSDLTR 3447 RSDELQR 3947 0.102 1416GCGGCCGAG 2448 RSDNLAR 2948 ERGTLAR 3448 RSDELQR 3948 0.153 1417GAGTTGGCC 2449 ERGTLAR 2949 RGDALTS 3449 RSDNLAR 3949 1.397 1418CTGGAGTTG 2450 RADALMV 2950 RSDNLAR 3450 RSDALRE 3950 0.241 1422GGGTCGGCG 2451 RSDELTR 2951 RSDDLTT 3451 RSDHLSR 3951 0.064 1423GGGTCGGCG 2452 RSDELTR 2952 RSDDLTK 3452 RSDHLSR 3952 0.034 1424CAGGGCCCG 2453 RSDELRE 2953 DRSHLAR 3453 RSDNLRE 3953 1.37 1427CAGGGCCCG 2454 RSDDLRE 2954 DRSHLAR 3454 RSDNLTE 3954 0.271 1428TGAGGCGAG 2455 RSDNLAR 2955 DRSHLAR 3455 QSVHLQS 3955 0.102 1429TGAGGCGAG 2456 RSDNLAR 2956 DRSHLAR 3456 QSGHLTT 3956 0.074 1430TCGGCCGCC 2457 ERGTLAR 2957 DRSDLTR 3457 RSDDLTK 3957 0.352 1431TCGGCCGCC 2458 ERGTLAR 2958 DRSDLTR 3458 RSDDLAS 3958 6.17 1432TCGGCCGCC 2459 ERGTLAR 2959 ERGTLAR 3459 RSDDLTK 3959 1.778 1434CTGGCCGTG 2460 RSDSLLR 2960 DRSDLTR 3460 RSDALRE 3960 0.051 1435TAATGGGGG 2461 RSDHLSR 2961 RSDHLTT 3461 QSGNLTK 3961 0.057 1436TGGGAGTGT 2462 TSDHLAS 2962 RSDNLAR 3462 RSDHLTT 3962 0.026 1439GGAGTGTTA 2463 QRSALAS 2963 RSDALAR 3463 QSGHLQR 3963 0.075 1440GGAGTGTTA 2464 QSGALTK 2964 RSDALAR 3464 QSGHLQR 3964 0.035 1441ATAGCTGGG 2465 RSDHLSR 2965 QSSDLTR 3465 QSGALTQ 3965 0.262 1442TGCTGGGCC 2466 ERGTLAR 2966 RSDHLTT 3466 DRSHLTK 3966 0.36 1443TGGAAGGAA 2467 QSGNLAR 2967 RSDNLTQ 3467 RSHHLTT 3967 0.22 1444TGGAAGGAA 2468 QSGNLAR 2968 RSDNLTQ 3468 RSSHLTT 3968 0.09 1445TGGAAGGAA 2469 QSGNLAR 2969 RLDNLTA 3469 RSHHLTT 3969 0.182 1446TGGAAGGAA 2470 QSGNLAR 2970 RLDNLTA 3470 RSSHLTT 3970 0.42 1454GGAGAGGCT 2471 QSSDLRR 2971 RSDNLAR 3471 QSGHLQR 3971 0.01 1455CGGGATGAA 2472 QSANLSR 2972 TSGNLVR 3472 RSDHLRE 3972 0.043 1456GGAGAGGCT 2473 QSSDLRR 2973 RSDNLAR 3473 QRAHLAR 3973 0.016 1457GCAGAGGAA 2474 QSANLSR 2974 RSDNTAR 3474 QSGSLTR 3974 0.014 1460TTGGGGGAG 2475 RSDNLAR 2975 RSDHLTR 3475 RADALMV 3975 0.007 1461GACGAGGAG 2476 RSANLAR 2976 RSDNLAR 3476 DRSNLTR 3976 0.014 1462CGGGATGAA 2477 QSGNLAR 2977 TSGNLVR 3477 RSDHLRE 3977 0.05 1463GAGGCTGTT 2478 TTSALTR 2978 QSSDLTR 3478 RSDNLAR 3978 0.003 1464GACGAGGAG 2479 RSDNLAR 2979 RSDNLTR 3479 DRSNLTR 3979 0.002 1465CTGGGAGTT 2480 TTSALTR 2980 QSGHLQR 3480 RSDALRE 3980 0.018 1466CTGGGAGTT 2481 NRATLAR 2981 QSGHLQR 3481 RSDALRE 3981 0.017 1468GGTGATGTC 2482 DRSALTR 2982 TSGNLVR 3482 MSHHLSR 3982 0.08 1469GGTGATGTC 2483 DRSALTR 2983 TSGNLVR 3483 TSGHLVR 3983 0.28 1470GGTGATGTC 2484 DRSALTR 2984 TSGNLVR 3484 QRAHLER 3984 0.156 1471CTGGTTGGG 2485 RSDHLSR 2985 QSSALTR 3485 RSDALRE 3985 0.09 1472TTGAAGGTT 2486 TTSALTR 2986 RSDNLTQ 3486 RADALMV 3986 3.22 1473TTGAAGGTT 2487 TTSALTR 2987 RSDNLTQ 3487 RSDSLTT 3987 0.47 1474TTGAAGGTT 2488 QSSALAR 2988 RSDNLTQ 3488 RADALMV 3988 1.39 1475TTGAAGGTT 2489 QSSALAR 2989 RSDNLTQ 3489 RLHSLTT 3989 0.39 1476TTGAAGGTT 2490 QSSALAR 2990 RSDNLTQ 3490 RSDSLTT 3990 0.305 1477GCAGCCCGG 2491 RSDHLRE 2991 DRSDLTR 3491 QSGSLTR 3991 2.31 1479GAAAGTTCA 2492 QSHDLTK 2992 MSHHLTQ 3492 QSGNLAR 3992 37.04 1480GAAAGTTCA 2493 NKTDLGK 2993 TSGHLVQ 3493 QSGNLAR 3993 62.5 1481GAAAGTTCA 2494 NKTDLGK 2994 TSDHLAS 3494 RSDELRE 3994 37.04 1482CCGTGTGAC 2495 DRSNLTR 2995 TSDHLAS 3495 RSDELRE 3995 111.1 1483CCGTGTGAC 2496 DRSNLTR 2996 MSHHLTT 3496 RSDELRE 3996 20.8 1484GAAGTGGTA 2497 QSSSLVR 2997 RSDALSR 3497 QSGNLAR 3997 0.01 1485AAGTGAGCT 2498 QSSDLRR 2998 QSGHLTT 3498 RSDNLTQ 3998 1.537 1486GGGTTTGAC 2499 DRSNLTR 2999 TTSALAS 3499 RSDHLSR 3999 0.085 1487TTGAAGGTT 2500 TTSALTR 3000 RSDNLTQ 3500 RLHSLTT 4000 0.188 1488AAGTGGTAG 2501 QSSDLRR 3001 QSGHLTT 3501 RLDNRTQ 4001 5.64 1490CTGGTTGGG 2502 RSDHLSR 3002 TSGSLTR 3502 RSDALRE 4002 0.04 1491AAGGGTTCA 2503 NKTDLGK 3003 DSSKLSR 3503 RLDNRTA 4003 4.12 1492AAGTGGTAG 2504 RSDNLTT 3004 RSDHLTT 3504 RSDNLTQ 4004 1.37 1493AAGTGGTAG 2505 RSDNLTT 3005 RSDHLTT 3505 RLDNRTQ 4005 15.09 1494GGGTTTGAC 2506 DRSNLTR 3006 QRSALAS 3506 RSDHLSR 4006 0.255 1496TTGGGGGAG 2507 RSDNLAR 3007 RSDHLTR 3507 RSDALTT 4007 0.065 1497GAGGCTCTT 2508 QSSALAR 3008 QSSDLTR 3508 RSDNLAR 4008 0.007 1498GAGGTTGAT 2509 QSSNLAR 3009 QSSALTR 3509 RSDNLAR 4009 0.101 1499GAGGTTGAT 2510 QSSNLAR 3010 TSGALTR 3510 RSDNLAR 4010 0.02 1500GCAGAGGAA 2511 QSGNLAR 3011 RSDNLAR 3511 QSGSLTR 4011 0.003 1522GCAATGGGT 2512 TSGHLVR 3012 RSDALTQ 3512 QSGDLTR 4012 0.08

[0120] TABLE 6 FINGER (N→C) TRIPLET (5′→3′) F1 F2 F3 AGG RXDHXXQ ATGRXDAXXQ CGG RXDHXXE GAA QXGNXXR GAC DXSNXXR DXSNXXR GAG RXDNXXR RXSNXXRRXDNXXR RXDNXXR GAT QXSNXXR TXGNXXR TXSNXXR TXGNXXR GCA QXGSXXR QXGDXXRGCC EXGTXXR GCG RXDEXXR RXDEXXR RXDEXXR RXDTXXK GCT QXSDXXR TXGEXXRQXSDXXR GGA QXGHXXR QXAHXXR GGC DXSHXXR DXSHXXR GGG RXDHXXR RXDHXXRRXDHXXR RXDHXXK GGT TXGHXXR GTA QXGSXXR QXATXXR GTG RXDAXXR RXDAXXRRXDAXXR RXDSXXR TAG RXDNXXT TCG RXDDXXK TGT TXDHXXS

What is claimed is:
 1. A zinc finger protein that binds to a targetsite, said zinc finger protein comprising a first (F1), a second (F2),and a third (F3) zinc finger, ordered F1, F2, F3 from N-terminus toC-terminus, said target site comprising, in 3′ to 5′ direction, a first(S1), a second (S2), and a third (S3) target subsite, each targetsubsite having the nucleotide sequence GNN, wherein if S1 comprises GAA,F1 comprises the amino acid sequence QRSNLVR; if S2 comprises GAA, F2comprises the amino acid sequence QSGNLAR; if S3 comprises GAA, F3comprises the amino acid sequence QSGNLAR; if S1 comprises GAG, F1comprises the amino acid sequence RSDNLAR; if S2 comprises GAG, F2comprises the amino acid sequence RSDNLAR; if S3 comprises GAG, F3comprises the amino acid sequence RSDNLTR; if S1 comprises GAC, F₁comprises the amino acid sequence DRSNLTR; if S2 comprises GAC, F2comprises the amino acid sequence DRSNLTR; if S3 comprises GAC, F3comprises the amino acid sequence DRSNLTR; if S1 comprises GAT, F1comprises the amino acid sequence QSSNLAR; if S2 comprises GAT, F2comprises the amino acid sequence TSGNLVR; if S3 comprises GAT, F3comprises the amino acid sequence TSANLSR; if S1 comprises GGA, F1comprises the amino acid sequence QSGHLAR; if S2 comprises GGA, F2comprises the amino acid sequence QSGHLQR; if S3 comprises GGA, F3comprises the amino acid sequence QSGHLQR; if S1 comprises GGG, F1comprises the amino acid sequence RSDHLAR; if S2 comprises GGG, F2comprises the amino acid sequence RSDHLSR; if S3 comprises GGG, F3comprises the amino acid sequence RSDHLSR; if S1 comprises GGC, F1comprises the amino acid sequence DRSHLRT; if S2 comprises GGC, F2comprises the amino acid sequence DRSHLAR; if S1 comprises GGT, F1comprises the amino acid sequence QSSHLTR; if S2 comprises GGT, F2comprises the amino acid sequence TSGHLSR; if S3 comprises GGT, F3comprises the amino acid sequence TSGHLVR; if S1 comprises GCA, F1comprises the amino acid sequence QSGSLTR; if S2 comprises GCA, F2comprises QSGDLTR; if S3 comprises GCA, F3 comprises QSGDLTR; if S1comprises GCG, F1 comprises the amino acid sequence RSDDLTR; if S2comprises GCG, F2 comprises the amino acid sequence RSDDLQR; if S3comprises GCG, F3 comprises the amino acid sequence RSDDLTR; if S1comprises GCC, F1 comprises the amino acid sequence ERGTLAR; if S2comprises GCC, F2 comprises the amino acid sequence DRSDLTR; if S3comprises GCC, F3 comprises the amino acid sequence DRSDLTR; if S1comprises GCT, F1 comprises the amino acid sequence QSSDLTR; if S2comprises GCT, F2 comprises the amino acid sequence QSSDLTR; if S3comprises GCT, F3 comprises the amino acid sequence QSSDLQR; if S1comprises GTA, F1 comprises the amino acid sequence QSGALTR; if S2comprises GTA, F2 comprises the amino acid sequence QSGALAR; if S1comprises GTG, F1 comprises the amino acid sequence RSDALTR; if S2comprises GTG, F2 comprises the amino acid sequence RSDALSR; if S3comprises GTG, F3 comprises the amino acid sequence RSDALTR; if S1comprises GTC, F1 comprises the amino acid sequence DRSALAR; if S2comprises GTC, F2 comprises the amino acid sequence DRSALAR; and if S3comprises GTC, F3 comprises the amino acid sequence DRSALAR.
 2. The zincfinger protein of claim 1, wherein S1 comprises GAA and F1 comprises theamino acid sequence QRSNLVR.
 3. The zinc finger protein of claim 1,wherein S2 comprises GAA and F2 comprises the amino acid sequenceQSGNLAR.
 4. The zinc finger protein of claim 1, wherein S3 comprises GAAand F3 comprises the amino acid sequence QSGNLAR.
 5. The zinc fingerprotein of claim 1, wherein S1 comprises GAG and F1 comprises the aminoacid sequence RSDNLAR.
 6. The zinc finger protein of claim 1, wherein S2comprises GAG and F2 comprises the amino acid sequence RSDNLAR.
 7. Thezinc finger protein of claim 1, wherein S3 comprises GAG and F3comprises the amino acid sequence RSDNLTR.
 8. The zinc finger protein ofclaim 1, wherein S1 comprises GAC and F1 comprises the amino acidsequence DRSNLTR.
 9. The zinc finger protein of claim 1, wherein S2comprises GAC and F2 comprises the amino acid sequence DRSNLTR.
 10. Thezinc finger protein of claim 1, wherein S3 comprises GAC and F3comprises the amino acid sequence DRSNLTR.
 11. The zinc finger proteinof claim 1, wherein S1 comprises GAT and F1 comprises the amino acidsequence QSSNLAR.
 12. The zinc finger protein of claim 1, wherein S2comprises GAT and F2 comprises the amino acid sequence TSGNLVR.
 13. Thezinc finger protein of claim 1, wherein S3 comprises GAT and F3comprises the amino acid sequence TSANLSR.
 14. The zinc finger proteinof claim 1, wherein S1 comprises GGA and F1 comprises the amino acidsequence QSGHLAR.
 15. The zinc finger protein of claim 1, wherein S2comprises GGA and F2 comprises the amino acid sequence QSGHLQR.
 16. Thezinc finger protein of claim 1, wherein S3 comprises GGA and F3comprises the amino acid sequence QSGHLQR.
 17. The zinc finger proteinof claim 1, wherein S1 comprises GGG and F1 comprises the amino acidsequence RSDHLAR.
 18. The zinc finger protein of claim 1, wherein S2comprises GGG and F2 comprises the amino acid sequence RSDHLSR.
 19. Thezinc finger protein of claim 1, wherein S3 comprises GGG and F3comprises the amino acid sequence RSDHLSR.
 20. The zinc finger proteinof claim 1, wherein S1 comprises GGC and F1 comprises the amino acidsequence DRSHLTR.
 21. The zinc finger protein of claim 1, wherein S2comprises GGC and F2 comprises the amino acid sequence DRSHLAR.
 22. Thezinc finger protein of claim 1, wherein S11 comprises GGT and F1comprises the amino acid sequence QSSHLTR.
 23. The zinc finger proteinof claim 1, wherein S2 comprises GGT and F2 comprises the amino acidsequence TSGHLSR.
 24. The zinc finger protein of claim 1, wherein S3comprises GGT and F3 comprises the amino acid sequence TSGHLVR.
 25. Thezinc finger protein of claim 1, wherein S1 comprises GCA and F1comprises the amino acid sequence QSGSLTR.
 26. The zinc finger proteinof claim 1, wherein S2 comprises GCA and F2 comprises the amino acidsequence QSGDLTR.
 27. The zinc finger protein of claim 1, wherein S3comprises GCA and F3 comprises the amino acid sequence QSGDLTR.
 28. Thezinc finger protein of claim 1, wherein S1 comprises GCG and F1comprises the amino acid sequence RSDDLTR.
 29. The zinc finger proteinof claim 1, wherein S2 comprises GCG and F2 comprises the amino acidsequence RSDDLQR.
 30. The zinc finger protein of claim 1, wherein S3comprises GCG and F3 comprises the amino acid sequence RSDDLTR.
 31. Thezinc finger protein of claim 1, wherein S1 comprises GCC and F1comprises the amino acid sequence ERGTLAR.
 32. The zinc finger proteinof claim 1, wherein S2 comprises GCC and F2 comprises the amino acidsequence DRSDLTR.
 33. The zinc finger protein of claim 1, wherein S3comprises GCC and F3 comprises the amino acid sequence DRSDLTR.
 34. Thezinc finger protein of claim 1, wherein S1 comprises GCT and F1comprises the amino acid sequence QSSDLTR.
 35. The zinc finger proteinof claim 1, wherein S2 comprises GCT and F2 comprises the amino acidsequence QSSDLTR.
 36. The zinc finger protein of claim 1, wherein S3comprises GCT and F3 comprises the amino acid sequence QSSDLQR.
 37. Thezinc finger protein of claim 1, wherein S1 comprises GTA and F1comprises the amino acid sequence QSGALTR.
 38. The zinc finger proteinof claim 1, wherein S2 comprises GTA and F2 comprises the amino acidsequence QSGALAR.
 39. The zinc finger protein of claim 1, wherein S1comprises GTG and F1 comprises the amino acid sequence RSDALTR.
 40. Thezinc finger protein of claim 1, wherein S2 comprises GTG and F2comprises the amino acid sequence RSDALSR.
 41. The zinc finger proteinof claim 1, wherein S3 comprises GTG and F3 comprises the amino acidsequence RSDALTR.
 42. The zinc finger protein of claim 1, wherein S1comprises GTC and F1 comprises the amino acid sequence DRSALAR.
 43. Thezinc finger protein of claim 1, wherein S2 comprises GTC and F2comprises the amino acid sequence DRSALAR.
 44. The zinc finger proteinof claim 1, wherein S3 comprises GTC and F3 comprises the amino acidsequence DRSALAR.
 45. A polypeptide comprising a zinc finger proteinaccording to claim
 1. 46. A polypeptide according to claim 45, furthercomprising at least one functional domain.
 47. A polynucleotide encodinga zinc finger protein according to claim
 1. 48. A polynucleotideencoding a polypeptide according to claim
 45. 49. A polynucleotideencoding a polypeptide according to claim 46.